+

CN115473718A - Business data anomaly identification method and device based on behavior association mining - Google Patents

Business data anomaly identification method and device based on behavior association mining Download PDF

Info

Publication number
CN115473718A
CN115473718A CN202211084180.7A CN202211084180A CN115473718A CN 115473718 A CN115473718 A CN 115473718A CN 202211084180 A CN202211084180 A CN 202211084180A CN 115473718 A CN115473718 A CN 115473718A
Authority
CN
China
Prior art keywords
behavior
user
nodes
layer
business data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211084180.7A
Other languages
Chinese (zh)
Other versions
CN115473718B (en
Inventor
沈文
郭骞
俞庚申
李慧芹
杨睿
韩维
刘一凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Jiangsu Electric Power Co Ltd
State Grid Smart Grid Research Institute of SGCC
Customer Service Center of State Grid Corp of China
Original Assignee
State Grid Jiangsu Electric Power Co Ltd
State Grid Smart Grid Research Institute of SGCC
Customer Service Center of State Grid Corp of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Jiangsu Electric Power Co Ltd, State Grid Smart Grid Research Institute of SGCC, Customer Service Center of State Grid Corp of China filed Critical State Grid Jiangsu Electric Power Co Ltd
Priority to CN202211084180.7A priority Critical patent/CN115473718B/en
Publication of CN115473718A publication Critical patent/CN115473718A/en
Application granted granted Critical
Publication of CN115473718B publication Critical patent/CN115473718B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for identifying abnormal business data based on behavior association mining, which relate to the field of data security processing, and comprise the following steps: determining internet behavior information of a user from the business data, and extracting user characteristics and behavior characteristics from the internet behavior information; the user characteristics are used for representing the operation environment information of the user internet surfing operation, and the behavior characteristics are used for representing the time sequence information of the user internet surfing operation; inputting the user characteristics and the behavior characteristics into a business data recognition model to obtain abnormal behavior information output by the business data recognition model; the business data recognition model is obtained by training based on a sample behavior flow graph of a user; the sample behavior flow graph is constructed based on sample user characteristics and sample behavior characteristics of the user. The invention can rapidly make judgment and response to the abnormal behavior by performing the abnormal recognition through the service data recognition model, thereby accurately realizing the management limitation of the user behavior and the like and meeting the development requirement of the big data era.

Description

一种基于行为关联挖掘的业务数据异常识别方法及装置A method and device for abnormal identification of business data based on behavior association mining

技术邻域technology neighborhood

本发明涉及数据安全处理领域,具体涉及一种基于行为关联挖掘的业务数据异常识别方法及装置。The invention relates to the field of data security processing, in particular to a method and device for identifying business data exceptions based on behavior association mining.

背景技术Background technique

为了保证网络环境的安全稳定,对大数据时代网络流量数据的监管变得非常重要。由于网络流量具有数据量大、随机性强等特性,目前的用户异常行为检测方法对不断更新的业务数据异常行为的检测既无法迅速地做出裁决和响应,也不能准确实现用户行为管理限制等,并不能满足海量数据高效准确检测的要求,已经无法满足大数据时代的发展需求。In order to ensure the security and stability of the network environment, the supervision of network traffic data in the era of big data becomes very important. Due to the large amount of data and strong randomness of network traffic, the current abnormal user behavior detection method can neither quickly make a ruling and respond to the constantly updated abnormal behavior of business data, nor can it accurately implement user behavior management restrictions, etc. , cannot meet the requirements of efficient and accurate detection of massive data, and has been unable to meet the development needs of the big data era.

因此,能够更加高效、更加准确地识别业务数据中的用户异常行为对网络安全管理起到关键的作用。Therefore, being able to identify abnormal user behavior in business data more efficiently and accurately plays a key role in network security management.

发明内容Contents of the invention

有鉴于此,本发明实施例提供了一种基于行为关联挖掘的业务数据异常识别方法及装置,以解决目前业务数据相关的用户异常行为检测无法迅速地做出裁决和响应的问题。In view of this, the embodiments of the present invention provide a business data anomaly identification method and device based on behavior association mining, so as to solve the problem that the current business data-related abnormal user behavior detection cannot quickly make a judgment and response.

根据第一方面,本发明实施例提供了一种基于行为关联挖掘的业务数据异常识别方法,所述方法包括:According to the first aspect, an embodiment of the present invention provides a business data anomaly identification method based on behavior association mining, the method comprising:

确定用户的上网行为信息,从上网行为信息中提取用户特征和行为特征;用户特征用于表征用户上网操作的操作环境信息,行为特征用于表征用户上网操作的时间顺序信息;Determine the user's online behavior information, extract user characteristics and behavior characteristics from the online behavior information; user characteristics are used to represent the operating environment information of the user's online operation, and the behavioral characteristics are used to represent the time sequence information of the user's online operation;

将用户特征和行为特征输入至业务数据识别模型中,得到由业务数据识别模型输出的异常行为信息;业务数据识别模型是基于用户的样本行为流图训练得到的;样本行为流图是基于用户的样本用户特征和样本行为特征构建的;Input user characteristics and behavior characteristics into the business data recognition model to obtain abnormal behavior information output by the business data recognition model; the business data recognition model is trained based on the user's sample behavior flow graph; the sample behavior flow graph is based on the user's Constructed from sample user characteristics and sample behavior characteristics;

业务数据识别模型用于基于用户特征和行为特征构建行为流图,确定行为流图中各个节点对应的融合邻域特征以及基于节点的融合领域特征确定节点对应的嵌入表示特征,并基于从嵌入表示特征中确定得到的分类结果,对业务数据进行异常行为预测;节点包含了用户特征和行为特征,且节点之间基于行为特征进行连线,连线用于表示节点之间的时序关系。The business data identification model is used to construct a behavior flow graph based on user characteristics and behavior characteristics, determine the fusion neighborhood characteristics corresponding to each node in the behavior flow diagram, and determine the embedded representation characteristics corresponding to nodes based on the fusion field characteristics of nodes, and based on the embedded representation The classification result determined in the feature is used to predict the abnormal behavior of the business data; the nodes contain user characteristics and behavior characteristics, and the nodes are connected based on the behavior characteristics, and the connection is used to represent the timing relationship between the nodes.

结合第一方面,在第一方面第一实施方式中,所述业务数据识别模型包括流图构建层、同质用户关系网络层和异质用户消息网络层;With reference to the first aspect, in the first implementation manner of the first aspect, the business data identification model includes a flow graph construction layer, a homogeneous user relationship network layer, and a heterogeneous user message network layer;

流图构建层用于基于用户特征和行为特征,构建用户的行为流图;The flow graph construction layer is used to build a user's behavior flow graph based on user characteristics and behavioral characteristics;

同质用户关系网络层用于基于节点以及节点的邻居节点,对行为流图中各个节点进行注意力运算,确定节点的融合邻域特征;邻居节点为与当前节点存在连线的节点;The homogeneous user relationship network layer is used to perform attention operations on each node in the behavior flow graph based on the node and its neighbor nodes, and determine the fusion neighborhood characteristics of the node; the neighbor node is a node that has a connection with the current node;

异质用户消息网络层用于节点的融合邻域特征以及邻居节点的融合邻域特征进行注意力运算以及语义注意力理解,确定节点的嵌入表示特征以及节点与对应的嵌入表示特征之间的距离,并基于距离确定异常行为信息。The heterogeneous user message network layer is used for the fusion neighborhood feature of the node and the fusion neighborhood feature of the neighbor node to perform attention operation and semantic attention understanding, and determine the embedding representation feature of the node and the distance between the node and the corresponding embedding representation feature , and determine the abnormal behavior information based on the distance.

结合第一方面第一实施方式,在第一方面第二实施方式中,所述将用户特征和行为特征输入至业务数据识别模型中,得到由业务数据识别模型输出的异常行为信息,具体包括:In combination with the first implementation mode of the first aspect, in the second implementation mode of the first aspect, the input of user characteristics and behavior characteristics into the business data identification model to obtain abnormal behavior information output by the business data identification model specifically includes:

将用户特征和行为特征输入至流图构建层中,得到流图构建层输出的行为流图;Input user characteristics and behavior characteristics into the flow graph construction layer to obtain the behavior flow graph output by the flow graph construction layer;

将行为流图输入至同质用户关系网络层中,得到同质用户关系网络层输出的节点对应的融合邻域特征;Input the behavior flow graph into the homogeneous user relationship network layer, and obtain the fusion neighborhood features corresponding to the nodes output by the homogeneous user relationship network layer;

将融合邻域特征输入至异质用户消息网络层中,得到异质用户消息网络层输出的异常行为信息;异常行为信息包括节点以及节点对应的状态特征。The fusion neighborhood feature is input into the heterogeneous user information network layer, and the abnormal behavior information output by the heterogeneous user information network layer is obtained; the abnormal behavior information includes the node and the state feature corresponding to the node.

结合第一方面第二实施方式,在第一方面第三实施方式中,所述同质用户关系网络层包括邻域卷积层、相似度确定层、归一化处理层和第一注意力运算层;With reference to the second implementation of the first aspect, in the third implementation of the first aspect, the homogeneous user relationship network layer includes a neighborhood convolution layer, a similarity determination layer, a normalization processing layer, and a first attention operation Floor;

邻域卷积层用于对节点、节点的邻居节点以及节点对应的卷积权重进行卷积运算,确定节点的状态特征;状态特征用于表征节点的标签;The neighborhood convolution layer is used to perform convolution operations on nodes, neighbor nodes of nodes, and convolution weights corresponding to nodes to determine the state characteristics of nodes; the state characteristics are used to represent the labels of nodes;

相似度确定层用于确定邻居节点对应的状态特征与节点对应的状态特征之间相似系数;The similarity determination layer is used to determine the similarity coefficient between the state feature corresponding to the neighbor node and the state feature corresponding to the node;

归一化处理层用于对所述相似系数进行归一化处理,确定领域节点与节点之间的注意力系数;The normalization processing layer is used to perform normalization processing on the similarity coefficient, and determine the attention coefficient between domain nodes and nodes;

第一注意力运算层用于基于邻居节点对应的状态特征、拼接权重以及注意力系数进行加权处理,确定节点对应的融合邻域特征。The first attention operation layer is used to perform weighting processing based on the state characteristics, splicing weights and attention coefficients corresponding to the neighbor nodes, and determine the fusion neighborhood characteristics corresponding to the nodes.

结合第一方面第二实施方式,在第一方面第四实施方式中,所述异质用户消息网络层包括权重学习层、第二注意力运算层、语义注意力理解层和预测输出层;With reference to the second embodiment of the first aspect, in the fourth embodiment of the first aspect, the heterogeneous user message network layer includes a weight learning layer, a second attention operation layer, a semantic attention understanding layer, and a prediction output layer;

第二注意力运算层用于基于邻居节点对应的融合邻域特征以及注意力权重进行加权处理,确定节点的时序特征;时序特征用于表征节点的语义;The second attention operation layer is used to perform weighted processing based on the fusion neighborhood features and attention weights corresponding to the neighbor nodes to determine the timing characteristics of the nodes; the timing characteristics are used to represent the semantics of the nodes;

语义注意力理解层用于对节点对应的时序特征进行映射,确定节点在元路径下的语义注意力权重,并基于节点对应的时序特征以及语义注意力权重进行加权处理,确定节点的嵌入表示特征;The semantic attention understanding layer is used to map the temporal features corresponding to the nodes, determine the semantic attention weight of the node under the meta-path, and perform weighted processing based on the temporal features and semantic attention weights corresponding to the nodes to determine the embedding representation features of the nodes ;

预测输出模块用于对确定嵌入表示特征的分类结果,并基于分类结果输出异常行为信息。The prediction output module is used to determine the classification result of the embedded representation feature, and output abnormal behavior information based on the classification result.

结合第一方面第三实施方式,在第一方面第五实施方式中,所述将行为流图输入至同质用户关系网络层中,得到同质用户关系网络层输出的节点对应的融合邻域特征,具体包括:With reference to the third embodiment of the first aspect, in the fifth embodiment of the first aspect, the behavior flow graph is input into the homogeneous user relationship network layer, and the fusion neighborhood corresponding to the node output by the homogeneous user relationship network layer is obtained features, including:

将行为流图输入至邻域卷积层中,得到邻域卷积层输出的节点对应的状态特征;Input the behavior flow graph into the neighborhood convolution layer to obtain the state characteristics corresponding to the nodes output by the neighborhood convolution layer;

将节点特征输入至相似度确定层中,得到相似度确定层输出的节点之间的相似系数;The node features are input into the similarity determination layer to obtain the similarity coefficient between the nodes output by the similarity determination layer;

将相似系数输入至归一化处理层中,得到归一化处理层输出的节点之间的注意力系数;Input the similarity coefficient into the normalization processing layer to obtain the attention coefficient between the nodes output by the normalization processing layer;

将节点的邻居节点对应的状态特征、拼接权重以及注意力系数输入至第一注意力运算层中,得到第一注意力运算层输出的节点对应的融合邻域特征。The state features, splicing weights, and attention coefficients corresponding to the neighbor nodes of the node are input into the first attention operation layer, and the fusion neighborhood features corresponding to the nodes output by the first attention operation layer are obtained.

结合第一方面第四实施方式,在第一方面第六实施方式中,所述将融合邻域特征输入至异质用户消息网络层中,得到异质用户消息网络层输出的异常行为信息,具体包括:With reference to the fourth embodiment of the first aspect, in the sixth embodiment of the first aspect, the input of the fused neighborhood features into the heterogeneous user message network layer to obtain the abnormal behavior information output by the heterogeneous user message network layer, specifically include:

将融合邻域特征输入至权重学习层中,得到权重学习层输出的节点对应的注意力权重;Input the fusion neighborhood feature into the weight learning layer to obtain the attention weight corresponding to the node output by the weight learning layer;

将节点的融合邻域特征以及注意力权重输入至第二注意力运算层中,得到第二注意力运算层输出的节点的时序特征;Input the fusion neighborhood feature and attention weight of the node into the second attention operation layer, and obtain the timing characteristics of the nodes output by the second attention operation layer;

将时序特征输入至语义注意力理解层中,得到语义注意力理解层输出的节点对应的嵌入表示特征;Input the timing features into the semantic attention understanding layer, and obtain the embedded representation features corresponding to the nodes output by the semantic attention understanding layer;

将嵌入表示特征输入至预测输出模块中,得到预测输出模块输出的异常行为信息。The embedded representation feature is input into the prediction output module, and the abnormal behavior information output by the prediction output module is obtained.

结合第一方面,在第一方面第七实施方式中,所述业务数据识别模型通过以下步骤训练得到:With reference to the first aspect, in the seventh implementation manner of the first aspect, the business data identification model is trained through the following steps:

从样本行为流图中确定样本节点的样本状态特征;样本行为流图中各个样本节点包含样本用户特征和样本行为特征,样本节点之间基于样本行为特征进行连线,连线用于表示样本节点之间的时序关系;Determine the sample state characteristics of the sample nodes from the sample behavior flow graph; each sample node in the sample behavior flow graph contains sample user characteristics and sample behavior characteristics, and connects the sample nodes based on the sample behavior characteristics, and the connection is used to represent the sample nodes the timing relationship between

将样本行为流图作为训练使用的输入数据,将样本节点对应的样本状态特征作为训练使用的标签,采用深度学习的方式进行训练,得到用于生成用户的上网行为信息的异常行为信息的业务数据识别模型。The sample behavior flow graph is used as the input data for training, and the sample state characteristics corresponding to the sample nodes are used as the labels for training, and the deep learning method is used for training to obtain the business data used to generate the user's online behavior information and abnormal behavior information Identify the model.

根据第二方面,本发明实施例还提供了一种基于行为关联挖掘的业务数据异常识别装置,所述装置包括:According to the second aspect, an embodiment of the present invention also provides a device for identifying abnormalities in business data based on behavior association mining, the device comprising:

特征提取模块,用于确定用户的上网行为信息,从上网行为信息中提取用户特征和行为特征;用户特征用于表征用户上网操作的操作环境信息,行为特征用于表征用户上网操作的时间顺序信息;The feature extraction module is used to determine the user's online behavior information, and extract user characteristics and behavior characteristics from the online behavior information; user characteristics are used to represent the operating environment information of the user's online operations, and the behavioral characteristics are used to represent the chronological information of the user's online operations ;

行为识别模块,用于将用户特征和行为特征输入至业务数据识别模型中,得到由业务数据识别模型输出的异常行为信息;业务数据识别模型是基于用户的样本行为流图训练得到的;样本行为流图是基于用户的样本用户特征和样本行为特征构建的;The behavior recognition module is used to input user characteristics and behavior characteristics into the business data recognition model to obtain abnormal behavior information output by the business data recognition model; the business data recognition model is trained based on the user's sample behavior flow graph; the sample behavior The flow graph is constructed based on the user's sample user characteristics and sample behavior characteristics;

业务数据识别模型用于基于用户特征和行为特征构建行为流图,确定行为流图中各个节点对应的融合邻域特征以及基于节点的融合领域特征确定节点对应的嵌入表示特征,并基于从嵌入表示特征中确定得到的分类结果,对业务数据进行异常行为预测;节点包含了用户特征和行为特征,且节点之间基于行为特征进行连线,连线用于表示节点之间的时序关系。The business data identification model is used to construct a behavior flow graph based on user characteristics and behavior characteristics, determine the fusion neighborhood characteristics corresponding to each node in the behavior flow diagram, and determine the embedded representation characteristics corresponding to nodes based on the fusion field characteristics of nodes, and based on the embedded representation The classification result determined in the feature is used to predict the abnormal behavior of the business data; the nodes contain user characteristics and behavior characteristics, and the nodes are connected based on the behavior characteristics, and the connection is used to represent the timing relationship between the nodes.

结合第二方面,在第二方面第一实施方式中,所述行为识别模块具体包括:With reference to the second aspect, in the first implementation manner of the second aspect, the behavior recognition module specifically includes:

流图构建单元,用于将用户特征和行为特征输入至流图构建层中,得到流图构建层输出的行为流图;The flow graph construction unit is used to input user characteristics and behavior characteristics into the flow graph construction layer to obtain the behavior flow graph output by the flow graph construction layer;

关系识别单元,用于将行为流图输入至同质用户关系网络层中,得到同质用户关系网络层输出的节点对应的融合邻域特征;The relationship recognition unit is used to input the behavior flow graph into the homogeneous user relationship network layer, and obtain the fusion neighborhood features corresponding to the nodes output by the homogeneous user relationship network layer;

消息识别单元,用于将融合邻域特征输入至异质用户消息网络层中,得到异质用户消息网络层输出的异常行为信息;异常行为信息包括节点以及节点对应的状态特征。The message recognition unit is used to input the fusion neighborhood feature into the heterogeneous user message network layer to obtain the abnormal behavior information output by the heterogeneous user message network layer; the abnormal behavior information includes nodes and state features corresponding to the nodes.

结合第二方面第一实施方式,在第二方面第二实施方式中,所述关系识别单元具体包括:With reference to the first implementation manner of the second aspect, in the second implementation manner of the second aspect, the relationship identification unit specifically includes:

第一识别单元,用于将行为流图输入至邻域卷积层中,得到邻域卷积层输出的节点对应的状态特征;The first identification unit is used to input the behavior flow graph into the neighborhood convolution layer, and obtain the state characteristics corresponding to the nodes output by the neighborhood convolution layer;

第二识别单元,用于将节点特征输入至相似度确定层中,得到相似度确定层输出的节点之间的相似系数;The second identification unit is used to input the node features into the similarity determination layer to obtain the similarity coefficient between the nodes output by the similarity determination layer;

第三识别单元,用于将相似系数输入至归一化处理层中,得到归一化处理层输出的节点之间的注意力系数;The third identification unit is used to input the similarity coefficient into the normalization processing layer, and obtain the attention coefficient between the nodes output by the normalization processing layer;

第四识别单元,用于将节点的邻居节点对应的状态特征、拼接权重以及注意力系数输入至第一注意力运算层中,得到第一注意力运算层输出的节点对应的融合邻域特征。The fourth identification unit is used to input the state features, splicing weights and attention coefficients corresponding to the neighbor nodes of the node into the first attention operation layer, and obtain the fused neighborhood features corresponding to the nodes output by the first attention operation layer.

结合第二方面第一实施方式,在第二方面第三实施方式中,所述消息识别单元具体包括:With reference to the first implementation manner of the second aspect, in the third implementation manner of the second aspect, the message identification unit specifically includes:

第五识别单元,用于将融合邻域特征输入至权重学习层中,得到权重学习层输出的节点对应的注意力权重;The fifth identification unit is used to input the fusion neighborhood feature into the weight learning layer to obtain the attention weight corresponding to the node output by the weight learning layer;

第六识别单元,用于权重学习层用于基于自注意力机制,确定节点的注意力权重;The sixth identification unit is used in the weight learning layer to determine the attention weight of the node based on the self-attention mechanism;

第七识别单元,用于将节点的融合邻域特征以及注意力权重输入至第二注意力运算层中,得到第二注意力运算层输出的节点的时序特征;The seventh identification unit is used to input the fusion neighborhood feature and attention weight of the node into the second attention operation layer to obtain the timing characteristics of the nodes output by the second attention operation layer;

第八识别单元,用于将时序特征输入至语义注意力理解层中,得到语义注意力理解层输出的节点对应的嵌入表示特征;The eighth recognition unit is used to input the time series feature into the semantic attention understanding layer, and obtain the embedded representation feature corresponding to the node output by the semantic attention understanding layer;

第九识别单元,用于将嵌入表示特征输入至预测输出模块中,得到预测输出模块输出的异常行为信息。The ninth identification unit is configured to input the embedded representation feature into the predictive output module to obtain the abnormal behavior information output by the predictive output module.

根据第三方面,本发明实施例还提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如上述任一种所述基于行为关联挖掘的业务数据异常识别方法的步骤。According to the third aspect, an embodiment of the present invention also provides an electronic device, including a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the program, the above-mentioned The steps of any one of the business data anomaly identification methods based on behavior association mining.

根据第四方面,本发明实施例还提供了一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如上述任一种所述基于行为关联挖掘的业务数据异常识别方法的步骤。According to the fourth aspect, an embodiment of the present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, it implements behavior-based association mining as described in any of the above The steps of the business data exception identification method.

本发明提供的基于行为关联挖掘的业务数据异常识别方法及装置,从业务数据中确定用户的上网行为信息,再从中提取用户特征和行为特征,提取到的信息中不仅保留了用户上网操作的操作环境信息,还保留了用户上网操作的时间顺序信息,并利用训练好的业务数据识别模型对业务数据进行识别,识别出其中的异常行为信息,由于业务数据识别模型是基于用户特征和行为特征构建行为流图,确定行为流图中各个节点对应的融合邻域特征以及基于节点的融合领域特征确定节点对应的嵌入表示特征,并基于从嵌入表示特征中确定得到的分类结果,对业务数据进行异常行为预测,因此业务数据识别模型融合了用户行为、行为产生的内容、用户行为之间的时序关系这三个方面的特征对业务数据异常进行识别,相对应的,异常行为信息在精准度、识别速度方面有着明显的提升,本申请通过该业务数据识别模型进行基于行为关联挖掘的业务数据异常识别,能够对异常行为迅速地做出裁决和响应,以此准确实现用户行为管理限制等,能够满足大数据时代的发展要求。The business data anomaly identification method and device based on behavior association mining provided by the present invention determine the user's online behavior information from the business data, and then extract user characteristics and behavior characteristics from it, and the extracted information not only retains the user's online operation Environmental information also retains the chronological information of users’ online operations, and uses the trained business data recognition model to identify business data and identify abnormal behavior information. Since the business data recognition model is constructed based on user characteristics and behavior characteristics Behavior flow graph, determine the fusion neighborhood features corresponding to each node in the behavior flow graph and determine the embedded representation features corresponding to the nodes based on the fusion field features of the nodes, and based on the classification results determined from the embedded representation features, abnormalize the business data Behavior prediction, so the business data identification model integrates the characteristics of three aspects: user behavior, content generated by the behavior, and the temporal relationship between user behaviors to identify abnormal business data. The speed has been significantly improved. This application uses the business data identification model to identify business data abnormalities based on behavior association mining, and can quickly make judgments and respond to abnormal behaviors, so as to accurately realize user behavior management restrictions, etc., and can meet The development requirements of the big data era.

附图说明Description of drawings

通过参考附图会更加清楚的理解本发明的特征和优点,附图是示意性的而不应理解为对本发明进行任何限制,在附图中:The features and advantages of the present invention will be more clearly understood by referring to the accompanying drawings, which are schematic and should not be construed as limiting the invention in any way. In the accompanying drawings:

图1示出了本发明提供的基于行为关联挖掘的业务数据异常识别方法的流程示意图;Fig. 1 shows a schematic flow diagram of a business data anomaly identification method based on behavior association mining provided by the present invention;

图2示出了本发明提供的业务数据识别模型的结果示意图之一;Fig. 2 shows one of the result schematic diagrams of the service data identification model provided by the present invention;

图3示出了本发明提供的基于行为关联挖掘的业务数据异常识别方法中步骤S20具体的流程示意图;FIG. 3 shows a specific flow diagram of step S20 in the business data anomaly identification method based on behavior association mining provided by the present invention;

图4示出了本发明提供的业务数据识别模型的结果示意图之二;Fig. 4 shows the second schematic diagram of the result of the business data identification model provided by the present invention;

图5示出了本发明提供的基于行为关联挖掘的业务数据异常识别方法中步骤S22具体的流程示意图;FIG. 5 shows a specific flow diagram of step S22 in the business data anomaly identification method based on behavior association mining provided by the present invention;

图6示出了本发明提供的业务数据识别模型的结果示意图之三;Fig. 6 shows the third schematic diagram of the result of the business data identification model provided by the present invention;

图7示出了本发明提供的基于行为关联挖掘的业务数据异常识别方法中步骤S23具体的流程示意图;Fig. 7 shows a specific flowchart of step S23 in the business data anomaly identification method based on behavior association mining provided by the present invention;

图8示出了本发明提供的业务数据识别模型训练过程的流程示意图;Fig. 8 shows a schematic flow chart of the business data identification model training process provided by the present invention;

图9示出了本发明提供的基于行为关联挖掘的业务数据异常识别装置的结构示意图;FIG. 9 shows a schematic structural diagram of a business data anomaly identification device based on behavior association mining provided by the present invention;

图10示出了本发明提供的电子设备的结构示意图。FIG. 10 shows a schematic structural diagram of an electronic device provided by the present invention.

具体实施方式detailed description

为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本邻域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without making creative efforts belong to the protection scope of the present invention.

网络流量分析对用户行为和网络环境的安全维护有着重要作用。现有技术中采用端口扫描、报文特征提取以及字段匹配等方法进行用户异常行为的识别,但是,随着用户业务异常行为不断地更新变化,端口扫描、报文特征获取及字段匹配的异常检测方法的代价越来越大,还存在无法快速准确识别异常行为等缺点。结合网络流量分析的用户业务异常检测主要通过网络流量提取、分析正常数据并制定相应的规则,借助制定的规则实现快速地识别出异常数据,但上述提到的结合网络流量分析的用户业务异常检测存在检测精确度不足的问题。Network traffic analysis plays an important role in user behavior and security maintenance of the network environment. In the prior art, methods such as port scanning, packet feature extraction, and field matching are used to identify user abnormal behaviors. However, as user business abnormal behaviors are constantly updated and changed, port scanning, packet feature The cost of the method is getting higher and higher, and there are still shortcomings such as the inability to quickly and accurately identify abnormal behaviors. User business anomaly detection combined with network traffic analysis mainly extracts and analyzes normal data through network traffic and formulates corresponding rules. With the help of the formulated rules, abnormal data can be quickly identified, but the above-mentioned user business anomaly detection combined with network traffic analysis There is a problem of insufficient detection accuracy.

目前机器学习技术已经被广泛应用到异常数据处理邻域,并可以作为用户异常行为识别的更优方案。相关的机器学习模型对网络层、传输层的流量数据进行特征提取,对检测出的已有异常类型数据进行样本标记,从而自动演变检测规则实现用户业务异常数据识别。但是,目前采用的机器学习模型缺少与用户上网行为有密切关系的特征,没有结合自定义用户业务行为类型的特殊网络环境需要等问题,使用户业务异常数据识别系统不仅开销大,还会影响实际数据的分析检测,进而对实际的识别效果造成影响。At present, machine learning technology has been widely applied to the field of abnormal data processing, and can be used as a better solution for user abnormal behavior identification. The relevant machine learning model extracts the features of the traffic data at the network layer and the transport layer, and samples the detected abnormal types of data, so as to automatically evolve the detection rules to realize the identification of abnormal user business data. However, the currently adopted machine learning model lacks features closely related to users’ online behavior, and does not combine the needs of special network environments with user-defined business behavior types. This makes the user business anomaly data identification system not only expensive, but also affects the actual situation. The analysis and detection of data will affect the actual recognition effect.

在本申请中提供了一种基于行为关联挖掘的业务数据异常识别方法,可用于电子设备,如电脑、手机、可穿戴智能设备、平板电脑等,图1是根据本申请实施例的基于行为关联挖掘的业务数据异常识别方法的流程图,如图1所示,该方法包括如下步骤:This application provides a business data anomaly identification method based on behavior association mining, which can be used in electronic devices, such as computers, mobile phones, wearable smart devices, tablet computers, etc. Figure 1 is a behavior association-based The flow chart of the mining business data anomaly identification method, as shown in Figure 1, the method includes the following steps:

S10、从业务数据中确定用户的上网行为信息,从上网行为信息中提取用户特征和行为特征。S10. Determine the user's online behavior information from the service data, and extract user characteristics and behavior characteristics from the online behavior information.

业务数据可以是事先存储在电子设备中,也可以是电子设备从外界获取到的。例如,可以是电子设备从各类流量监控设备中采集到的业务数据,包括但不局限于从网络层、传输层的流量数据中采集到业务数据。在此,对本申请如何获取到用户的上网行为信息并不做任何限制,只需保证其能够得到业务数据即可。The service data may be stored in the electronic device in advance, or obtained by the electronic device from the outside. For example, it may be business data collected by electronic devices from various traffic monitoring devices, including but not limited to business data collected from traffic data at the network layer and transport layer. Here, there is no restriction on how the application obtains the user's online behavior information, it only needs to ensure that the user can obtain the service data.

在本实施例中,用户特征用于表征用户上网操作的操作环境信息,用户特征包括用户设备、应用程序、使用的浏览器和网页等信息,行为特征用于表征用户上网操作的时间顺序信息,行为特征则包含了用户各项上网操作的时间顺序,即,行为特征包含了用户特征的时序信息,例如某用户习惯先点击某个功能图标之后再点击另一个功能图标等等。In this embodiment, the user characteristics are used to represent the operating environment information of the user's online operation, the user characteristics include information such as user equipment, application programs, browsers used, and web pages, and the behavioral characteristics are used to represent the chronological information of the user's online operations. Behavior features include the chronological order of the user's various online operations, that is, the behavior features include the timing information of user characteristics, for example, a user is accustomed to clicking a certain function icon first and then clicking another function icon, and so on.

在本实施例中,业务数据中可以包含至少一位用户的上网行为信息,因此,从这些上网行为信息中也会提取到对应数量用户的用户特征和行为特征,例如业务数据中包含了用户A、B和C三位用户的上网行为信息,那么从上网行为信息中会分别提取到用户A对应的用户特征和行为特征、用户B对应的用户特征和行为特征以及用户C对应的用户特征和行为特征。In this embodiment, the business data may contain at least one user's online behavior information, therefore, the user characteristics and behavior characteristics of a corresponding number of users will also be extracted from the online behavior information, for example, the business data includes user A , B and C three users’ online behavior information, then from the online behavior information, the user characteristics and behavior characteristics corresponding to user A, the user characteristics and behavior characteristics corresponding to user B, and the user characteristics and behavior corresponding to user C will be extracted respectively. feature.

S20、将用户特征和行为特征输入至训练好的业务数据识别模型中,得到由业务数据识别模型输出的异常行为信息,在本申请中,业务数据识别模型是基于用户的样本行为流图训练得到的,样本行为流图是基于用户的样本用户特征和样本行为特征构建的,业务数据识别模型用于基于用户特征和行为特征构建行为流图,确定行为流图中各个节点对应的融合邻域特征以及基于节点的融合领域特征确定节点对应的嵌入表示特征,并基于从嵌入表示特征中确定得到的分类结果,对业务数据进行异常行为预测。在本实施例中,行为流图中各个节点包含了用户特征和行为特征,由于行为特征中包含了相关的时序信息,节点之间会基于行为特征进行连线,而连线则用于表示节点之间的时序关系。S20. Input user characteristics and behavior characteristics into the trained business data recognition model to obtain abnormal behavior information output by the business data recognition model. In this application, the business data recognition model is obtained by training based on the user's sample behavior flow graph Yes, the sample behavior flow graph is constructed based on the user’s sample user characteristics and sample behavior characteristics. The business data identification model is used to construct a behavior flow graph based on user characteristics and behavior characteristics, and determine the fusion neighborhood features corresponding to each node in the behavior flow graph And based on the fused domain features of the nodes, the embedded representation features corresponding to the nodes are determined, and based on the classification results determined from the embedded representation features, the abnormal behavior of the business data is predicted. In this embodiment, each node in the behavior flow graph contains user characteristics and behavior characteristics. Since the behavior characteristics contain relevant timing information, the nodes will be connected based on the behavior characteristics, and the connections are used to represent the nodes. timing relationship between them.

嵌入表示特征就是节点在嵌入空间(Embedding Space)的表示,即,对行为流图中各个节点进行编码,使得节点在嵌入空间的相似度近似节点在原图中的相似度。在本申请中,会先基于提取到的用户特征和行为特征构建用户的行为流图,之后会基于每个用户特征(行为流图的每个节点)、节点的邻居节点、邻居节点的权重得到节点对应的融合领域特征,再基于节点的融合邻域特征以及节点的邻居节点、邻居节点的权重确定嵌入表征特征,并基于从嵌入表示特征确定得到的分类结果,对业务数据进行异常行为预测。具体如何得到节点对应的嵌入表示特征,将在下文进行具体阐述。The embedded representation feature is the representation of nodes in the embedding space (Embedding Space), that is, to encode each node in the behavior flow graph, so that the similarity of the nodes in the embedding space approximates the similarity of the nodes in the original graph. In this application, the user's behavior flow graph will be constructed first based on the extracted user characteristics and behavior characteristics, and then based on each user characteristic (each node in the behavior flow graph), the neighbor nodes of the node, and the weight of the neighbor nodes to obtain Based on the fused domain features corresponding to the node, the embedded representation features are determined based on the fused neighborhood features of the node, the neighbor nodes of the node, and the weights of the neighbor nodes, and based on the classification results determined from the embedded representation features, the abnormal behavior of the business data is predicted. Specifically, how to obtain the embedded representation features corresponding to the nodes will be described in detail below.

其中,行为流图具有若干节点,节点为用户的相关信息,包含了用户特征和行为特征,用于表示用户的行为静态信息,节点之间基于行为特征进行连线,并且连线用于表示节点之间的时序关系,与节点A存在连线的节点即为节点A的邻居节点。类似的,样本行为流图具有若干样本节点,样本节点同样为用户的相关信息,包含了用户特征和用户行为特征,用于表示用户的行为静态信息,样本节点之间基于样本行为特征进行连线,并且连线用于表示样本节点之间的时序关系,与样本节点C存在连线的节点即为样本节点C的邻居节点。Among them, the behavior flow graph has several nodes, the nodes are related information of the user, including user characteristics and behavior characteristics, and are used to represent the static information of the user's behavior, and the nodes are connected based on the behavior characteristics, and the connection is used to represent the nodes The timing relationship between nodes, the node connected to node A is the neighbor node of node A. Similarly, the sample behavior flow graph has several sample nodes. The sample nodes are also user-related information, including user characteristics and user behavior characteristics, and are used to represent the static information of user behavior. The connection between sample nodes is based on the sample behavior characteristics. , and the connection line is used to represent the timing relationship between the sample nodes, and the node connected to the sample node C is the neighbor node of the sample node C.

可以理解的是,样本用户特征和样本行为特征是从该用户的样本上网行为信息中提取到的,样本上网行为信息是从历史业务数据中提取得到的。在此,对本申请如何获取到用户的历史业务数据并不做任何限制,只需保证其能够得到历史业务数据即可。It can be understood that the sample user characteristics and sample behavior characteristics are extracted from the user's sample online behavior information, and the sample online behavior information is extracted from historical service data. Here, there is no restriction on how the application obtains the user's historical service data, and it is only necessary to ensure that the user can obtain the historical service data.

更具体的,在得到用户的上网行为信息后,从上网行为信息中提取到该用户的用户特征和行为特征,之后,将用户特征和行为特征分别填充到行为流图的各个节点和各条边中,完成行为流图的构建,节点即用户信息节点,节点上承载着用户设备、应用程序、使用的浏览器和网页等相关信息,节点A至节点B的连线(边)则代表了用户行为的时序关系也就是上网行为的时间顺序。More specifically, after obtaining the user's online behavior information, extract the user's user characteristics and behavior characteristics from the online behavior information, and then fill the user characteristics and behavior characteristics into each node and each edge of the behavior flow graph respectively In the process, the construction of the behavior flow graph is completed. Nodes are user information nodes, which carry relevant information such as user equipment, applications, browsers used, and web pages. The connection (edge) from node A to node B represents the user information. The timing relationship of behavior is the time sequence of online behavior.

样本行为流图的构建方式于行为流图一致,在此不做赘述。The construction method of the sample behavior flow graph is the same as that of the behavior flow graph, and will not be repeated here.

本发明的基于行为关联挖掘的业务数据异常识别方法,从业务数据中确定用户的上网行为信息,再从中提取用户特征和行为特征,提取到的信息中不仅保留了用户上网操作的操作环境信息,还保留了用户上网操作的时间顺序信息,并利用训练好的业务数据识别模型对业务数据进行识别,识别出其中的异常行为信息,由于业务数据识别模型是基于用户特征和行为特征构建行为流图,确定行为流图中各个节点对应的融合邻域特征以及基于节点的融合领域特征确定节点对应的嵌入表示特征,并基于从嵌入表示特征中确定得到的分类结果,对业务数据进行异常行为预测,因此业务数据识别模型融合了用户行为、行为产生的内容、用户行为之间的时序关系这三个方面的特征对业务数据异常进行识别,相对应的,异常行为信息在精准度、识别速度方面有着明显的提升,本申请通过该业务数据识别模型进行基于行为关联挖掘的业务数据异常识别,能够对异常行为迅速地做出裁决和响应,以此准确实现用户行为管理限制等,能够满足大数据时代的发展要求。The business data anomaly identification method based on behavior association mining of the present invention determines the user's online behavior information from the business data, and then extracts user characteristics and behavior characteristics therefrom, and the extracted information not only retains the operating environment information of the user's online operation, It also retains the chronological information of users’ online operations, and uses the trained business data recognition model to identify business data and identify abnormal behavior information in it. Since the business data recognition model is based on user characteristics and behavior characteristics, it constructs a behavior flow graph , determine the fused neighborhood features corresponding to each node in the behavior flow graph and determine the embedded representation features corresponding to the nodes based on the fused domain features of the nodes, and based on the classification results determined from the embedded representation features, predict the abnormal behavior of the business data, Therefore, the business data identification model integrates the characteristics of the three aspects of user behavior, content generated by the behavior, and the temporal relationship between user behaviors to identify abnormal business data. Correspondingly, abnormal behavior information has advantages in terms of accuracy and recognition speed. Significant improvement, this application uses the business data identification model to identify business data anomalies based on behavior association mining, and can quickly make judgments and respond to abnormal behaviors, so as to accurately realize user behavior management restrictions, etc., and can meet the requirements of the big data era development requirements.

下面结合图2描述本发明的基于行为关联挖掘的业务数据异常识别方法,在本实施例中,业务数据识别模型包括流图构建层、同质用户关系网络层和异质用户消息网络层。The business data anomaly identification method based on behavior association mining of the present invention is described below in conjunction with FIG. 2 . In this embodiment, the business data identification model includes a flow graph construction layer, a homogeneous user relationship network layer and a heterogeneous user message network layer.

其中,流图构建层用于基于用户特征和行为特征,构建用户的行为流图;同质用户关系网络层用于基于行为流图中各个节点以及节点的邻居节点,对行为流图中各个节点进行注意力运算,确定节点的融合邻域特征;异质用户消息网络层用于节点的融合邻域特征以及邻居节点的融合邻域特征进行注意力运算以及语义注意力理解,确定节点的嵌入表示特征以及节点与对应的嵌入表示特征之间的距离,并基于距离确定异常行为信息。Among them, the flow graph construction layer is used to construct the user's behavior flow graph based on user characteristics and behavior characteristics; the homogeneous user relationship network layer is used to analyze each node in the behavior flow graph based on each node in the behavior flow graph and the neighbor nodes of the node. Perform attention calculations to determine the fusion neighborhood features of nodes; the heterogeneous user message network layer is used to perform attention calculations and semantic attention understanding on the fusion neighborhood features of nodes and neighbor nodes to determine the embedded representation of nodes The feature and the distance between the node and the corresponding embedding represent the feature, and the abnormal behavior information is determined based on the distance.

在本实施例中,行为流图G=(V,E),其中,V={u1,u1,…,un},ui代表第i个用户特征,i∈n,E={ε11,…,εm},εj表示第j个行为特征,j∈m。In this embodiment, the behavior flow graph G=(V,E), where V={u 1 ,u 1 ,…,u n }, u i represents the i-th user feature, i∈n, E={ ε 11 ,…,ε m }, ε j represents the jth behavioral feature, j∈m.

在本实施例中,同质用户关系网络层是基于同质图的图神经网络,更具体的,同质用户关系网络层采用了图卷积网络和图注意网络;异质用户消息网络层是基于异构体的图神经网络,更具体的,异质用户消息网络层采用了基于异质图注意力网络。In this embodiment, the homogeneous user relationship network layer is a graph neural network based on a homogeneous graph. More specifically, the homogeneous user relationship network layer uses a graph convolutional network and a graph attention network; the heterogeneous user message network layer is The isomer-based graph neural network, more specifically, the heterogeneous user message network layer uses a heterogeneous graph-based attention network.

本申请的业务数据识别模型利用图结构最大程度的保存了用户行为产生的社交网络的结构,图结构中不仅保留了用户上网操作的操作环境信息,还保留了用户上网操作的时间顺序信息,因此业务数据识别模型融合了社交网络的用户行为、行为产生的内容、用户行为之间的时序关系这三个方面的特征对业务数据异常进行识别,在业务数据异常识别精准度、识别速度方面有着明显的提升。The business data identification model of this application uses the graph structure to preserve the structure of the social network generated by user behavior to the greatest extent. The graph structure not only retains the operating environment information of the user's online operation, but also retains the time sequence information of the user's online operation. Therefore The business data identification model integrates the characteristics of user behavior, content generated by the behavior, and the temporal relationship between user behaviors in social networks to identify business data anomalies, and has obvious advantages in the accuracy and speed of business data anomaly identification. improvement.

因此,请参阅图3,步骤S20具体包括:Therefore, referring to FIG. 3, step S20 specifically includes:

S21、将用户特征和行为特征输入至流图构建层中,得到流图构建层输出的行为流图。S21. Input user characteristics and behavior characteristics into the flow graph construction layer, and obtain a behavior flow graph output by the flow graph construction layer.

S22、将行为流图输入至同质用户关系网络层中,得到同质用户关系网络层输出的节点对应的融合邻域特征。S22. Input the behavior flow graph into the homogeneous user relationship network layer, and obtain the fusion neighborhood features corresponding to the nodes output by the homogeneous user relationship network layer.

S23、将融合邻域特征输入至异质用户消息网络层中,得到异质用户消息网络层输出的异常行为信息,异常行为信息包括节点以及节点对应的状态特征,其中,状态特征为该节点的标签,对应了异常行为的类别。S23. Input the fusion neighborhood feature into the heterogeneous user message network layer to obtain the abnormal behavior information output by the heterogeneous user message network layer. The abnormal behavior information includes the node and the state feature corresponding to the node, wherein the state feature is the node's Label, corresponding to the category of abnormal behavior.

下面结合图4描述本发明的基于行为关联挖掘的业务数据异常识别方法,同质用户关系网络层包括邻域卷积层、相似度确定层、归一化处理层和第一注意力运算层。The business data anomaly identification method based on behavior association mining of the present invention is described below in conjunction with FIG. 4 . The homogeneous user relationship network layer includes a neighborhood convolution layer, a similarity determination layer, a normalization processing layer, and a first attention operation layer.

其中,邻域卷积层用于对节点、节点的邻居节点以及节点对应的卷积权重进行卷积运算,确定节点的状态特征,状态特征用于表征节点的标签,在本申请中,状态特征会作为用户异常行为检测的标签进行输出;相似度确定层用于确定邻居节点对应的状态特征与节点对应的状态特征之间相似系数;归一化处理层用于对所述相似系数进行归一化处理,确定领域节点与节点之间的注意力系数;第一注意力运算层用于基于邻居节点对应的状态特征、拼接权重以及注意力系数进行加权处理,确定节点对应的融合邻域特征。Among them, the neighborhood convolution layer is used to perform convolution operations on the node, the neighbor nodes of the node, and the convolution weights corresponding to the node, and determine the state characteristics of the nodes. The state characteristics are used to represent the labels of the nodes. In this application, the state characteristics It will be output as a label for user abnormal behavior detection; the similarity determination layer is used to determine the similarity coefficient between the state characteristics corresponding to the neighbor node and the state characteristics corresponding to the node; the normalization processing layer is used to normalize the similarity coefficient The first attention operation layer is used to perform weighted processing based on the state characteristics, splicing weights and attention coefficients corresponding to the neighbor nodes, and determine the fusion neighborhood features corresponding to the nodes.

因此,请参阅图5,步骤S22具体包括:Therefore, referring to Fig. 5, step S22 specifically includes:

S221、将行为流图输入至邻域卷积层中,得到邻域卷积层输出的节点对应的状态特征。S221. Input the behavior flow graph into the neighborhood convolution layer, and obtain the state features corresponding to the nodes output by the neighborhood convolution layer.

邻域卷积层是基于空间的图卷积网络,也就是基于节点的空间关系来定义图卷积的,卷积方式是将某个节点其所有的邻居节点的状态累加以更新当前节点的状态,得到该节点的状态特征,具体的为:The neighborhood convolution layer is a space-based graph convolution network, that is, the graph convolution is defined based on the spatial relationship of nodes. The convolution method is to accumulate the state of a node and all its neighbor nodes to update the state of the current node. , get the state characteristics of the node, specifically:

Figure BDA0003834792640000131
Figure BDA0003834792640000131

其中,hl(v)表示第l层节点v的状态特征,hl(v)表示第l+1层节点v的状态特征;N(v)表示节点v的邻居节点;σ表示激活系数,也就是会加入一个非线性的激活函数σ;

Figure BDA0003834792640000132
为第l层节点v的卷积权重,用于特征的增强,在本实施例中,任意一个节点i在第l层对应的卷积权重
Figure BDA0003834792640000133
是可以学习的参数矩阵,这个可学习的参数矩阵
Figure BDA0003834792640000134
用以聚合节点i在第l层的特征并实现特征向量维度的转换。Among them, h l (v) represents the state characteristics of node v in layer l, h l (v) represents the state characteristics of node v in layer l+1; N(v) represents the neighbor nodes of node v; σ represents the activation coefficient, That is, a nonlinear activation function σ will be added;
Figure BDA0003834792640000132
is the convolution weight of node v in layer l, which is used for feature enhancement. In this embodiment, the convolution weight corresponding to any node i in layer l
Figure BDA0003834792640000133
is a learnable parameter matrix, this learnable parameter matrix
Figure BDA0003834792640000134
It is used to aggregate the features of node i in layer l and realize the conversion of feature vector dimensions.

作为本发明一种可选的实施方式,邻域卷积层为一层的卷积层和一局部输出函数,域卷积层是将整张的行为流图作为输入,通过该卷积层对所有的节点以及节点对应的邻居节点进行一次卷积操作,并且以卷积结果更新节点的状态,得到节点的状态特征,最后经由一个局部输出函数将节点的状态特征转换为用户异常检测的标签输出。As an optional implementation of the present invention, the neighborhood convolutional layer is a one-layer convolutional layer and a local output function, and the domain convolutional layer takes the entire behavior flow graph as input, through which the convolutional layer All nodes and their corresponding neighbor nodes perform a convolution operation, and update the state of the node with the convolution result to obtain the state characteristics of the node, and finally convert the state characteristics of the node into a label output for user anomaly detection through a local output function .

作为本发明另一种可选的实施方式,邻域卷积层为多层的卷积层和一输出函数,邻域卷积层同样是将整张的行为流图作为输入,在每一层卷积层里都对所有节点以及节点对应的邻居节点进行一次卷积操作,并用卷积结果更新此节点,之后再经过激活函数输入到下一层卷积层,循环往复,最后经由一个局部输出函数将节点的状态转换为异常用户检测的标签输出。As another optional implementation of the present invention, the neighborhood convolution layer is a multi-layer convolution layer and an output function, and the neighborhood convolution layer also takes the entire behavior flow graph as input, and in each layer In the convolutional layer, a convolution operation is performed on all nodes and the corresponding neighbor nodes of the node, and the node is updated with the convolution result, and then input to the next convolutional layer through the activation function, and the cycle is repeated, and finally through a local output The function converts the state of a node into a label output for anomalous user detection.

即节点的状态依据其邻居节点的状态进行更新,且其中对于节点i权重的分配主要依赖于节点i对应的卷积权重Wi,Wi是不断学习并且优化的,具体可以通过前向传播和反向传播进行参数的优化。That is, the state of a node is updated according to the state of its neighbor nodes, and the distribution of the weight of node i mainly depends on the convolution weight W i corresponding to node i, and W i is continuously learned and optimized, specifically through forward propagation and Backpropagation optimizes the parameters.

S222、将节点特征输入至相似度确定层中,得到相似度确定层输出的节点之间的相似系数。S222. Input node features into the similarity determination layer, and obtain similarity coefficients between nodes output by the similarity determination layer.

相似度确定层和之后的归一化处理层、第一注意力运算层为加入注意力机制的图注意网络,图注意网络中的注意力运算仅对于某一节点的邻居节点进行,例如对于节点i,逐个计算其邻居节点与其的相似系数,具体的为:The similarity determination layer, the subsequent normalization processing layer, and the first attention operation layer are the graph attention network with the attention mechanism. The attention operation in the graph attention network is only performed on the neighbor nodes of a certain node, for example, for the node i, calculate the similarity coefficient between its neighbor nodes and it one by one, specifically:

eij=a([Vihi||Vihj]),j∈N(i)e ij =a([V i h i ||V i h j ]),j∈N(i)

其中,eij表示节点i的邻居节点j与节点i之间的相似系数;[·||·]是特征拼接函数,用于对升维后的特征进行拼接;a为映射系数,在本实施例中,a为一个函数,用于将拼接后的高维特征映射成实数,具体则是通过单层的前馈神经网络来实现的;Vi是节点i的拼接权重,N(i)表示节点i的邻居节点。Among them, e ij represents the similarity coefficient between the neighbor node j of node i and node i; [·||·] is a feature splicing function, which is used to splice the features after dimension enhancement; a is the mapping coefficient, in this implementation In the example, a is a function used to map the spliced high-dimensional features into real numbers, specifically through a single-layer feed-forward neural network; V i is the splicing weight of node i, and N(i) represents Neighbor nodes of node i.

本申请中使用了加入注意力机制的图注意网络,其不仅可以为不同节点分配不同的权重(即拼接权重),而且训练时只依赖成对的邻居节点,而不是具体的整体网络结构,使得业务数据识别模型拥有更好的泛化性。In this application, a graph attention network with an attention mechanism is used, which can not only assign different weights (ie splicing weights) to different nodes, but also only rely on paired neighbor nodes during training, rather than the specific overall network structure, so that The business data identification model has better generalization.

S223、将相似系数输入至归一化处理层中,得到归一化处理层输出的节点之间的注意力系数。S223. Input the similarity coefficient into the normalization processing layer, and obtain the attention coefficient between nodes output by the normalization processing layer.

对相似系数eij进行归一化处理,得到邻居节点j与节点i之间的注意力系数aij,即:Normalize the similarity coefficient e ij to obtain the attention coefficient a ij between neighbor node j and node i, namely:

Figure BDA0003834792640000141
Figure BDA0003834792640000141

S224、将节点的邻居节点对应的状态特征、拼接权重以及注意力系数输入至第一注意力运算层中,得到第一注意力运算层输出的节点对应的融合邻域特征。S224. Input the state features, splicing weights, and attention coefficients corresponding to the neighbor nodes of the node into the first attention operation layer, and obtain fusion neighborhood features corresponding to the nodes output by the first attention operation layer.

第一注意力运算层中,根据节点的邻居节点对应的状态特征、拼接权重以及注意力系数,对特征进行加权求和得到节点对应的融合邻域特征,具体的为:In the first attention operation layer, according to the state features, splicing weights and attention coefficients corresponding to the neighbor nodes of the node, the features are weighted and summed to obtain the fusion neighborhood features corresponding to the node, specifically:

Figure BDA0003834792640000142
Figure BDA0003834792640000142

其中,h′i为节点i的融合邻域特征,融合领域特征即结合了节点i的邻居节点的特征信息之后的新特征信息。Among them, h′ i is the fusion neighborhood feature of node i, and the fusion domain feature is the new feature information after combining the feature information of the neighbor nodes of node i.

下面结合图6描述本发明的基于行为关联挖掘的业务数据异常识别方法,异质用户消息网络层包括权重学习层、第二注意力运算层、语义注意力理解层和预测输出层。The business data anomaly identification method based on behavior association mining of the present invention is described below in conjunction with FIG. 6 . The heterogeneous user message network layer includes a weight learning layer, a second attention operation layer, a semantic attention understanding layer, and a prediction output layer.

其中,权重学习层用于基于自注意力机制,确定节点的注意力权重;第二注意力运算层用于基于邻居节点对应的融合邻域特征以及注意力权重进行加权处理,确定节点的时序特征;时序特征用于表征节点的语义;语义注意力理解层用于对节点对应的时序特征进行映射,确定节点在元路径下的语义注意力权重,并基于节点对应的时序特征以及语义注意力权重进行加权处理,确定节点的嵌入表示特征;预测输出模块用于对确定嵌入表示特征的分类结果,并基于分类结果输出异常行为信息。Among them, the weight learning layer is used to determine the attention weight of nodes based on the self-attention mechanism; the second attention operation layer is used to perform weighted processing based on the fusion neighborhood features and attention weights corresponding to neighbor nodes to determine the timing characteristics of nodes ;The timing feature is used to represent the semantics of the node; the semantic attention understanding layer is used to map the timing feature corresponding to the node, determine the semantic attention weight of the node under the meta path, and based on the timing feature corresponding to the node and the semantic attention weight Perform weighting processing to determine the embedding representation characteristics of the nodes; the prediction output module is used to determine the classification results of the embedding representation characteristics, and output abnormal behavior information based on the classification results.

因此,请参阅图7,步骤S23具体包括:Therefore, referring to FIG. 7, step S23 specifically includes:

S231、将融合邻域特征输入至权重学习层中,得到权重学习层输出的节点对应的注意力权重。S231. Input the fusion neighborhood feature into the weight learning layer, and obtain the attention weights corresponding to the nodes output by the weight learning layer.

权重学习层中通过节点的自注意力机制来学习邻居的权重,对于融合拼接后的节点i的融合邻域特征h′i以及其邻居节点j的融合邻域特征h′j,通过一个可以学习的节点注意力向量aΦ来学习邻居节点j相对于节点i的注意力权重

Figure BDA0003834792640000151
具体的为:In the weight learning layer, the weight of the neighbors is learned through the self-attention mechanism of the node. For the fusion neighborhood feature h′ i of the node i after fusion splicing and the fusion neighborhood feature h′ j of its neighbor node j , one can learn The node attention vector a Φ to learn the attention weight of neighbor node j relative to node i
Figure BDA0003834792640000151
Specifically:

Figure BDA0003834792640000152
Figure BDA0003834792640000152

S232、将节点的融合邻域特征以及注意力权重输入至第二注意力运算层中,得到第二注意力运算层输出的节点的时序特征。S232. Input the fused neighborhood features and attention weights of the nodes into the second attention operation layer, and obtain the time sequence features of the nodes output by the second attention operation layer.

在得到某个节点i的各个邻居节点对应的注意力权重后,同图注意网络,根据注意力权重对融合邻域特征加权求和得到节点i的时序特征,具体的为:After obtaining the attention weights corresponding to each neighbor node of a certain node i, use the attention network in the same figure to weight and sum the fusion neighborhood features according to the attention weights to obtain the timing characteristics of node i, specifically:

Figure BDA0003834792640000153
Figure BDA0003834792640000153

其中,

Figure BDA0003834792640000154
表示节点i的时序特征,时序特征是融合了节点在各条元路径也就是结合邻居节点的相关信息,例如节点A存在10个邻居节点,那么该节点A存在10条元路径。in,
Figure BDA0003834792640000154
Represents the timing feature of node i, which is the fusion of the node's information on each meta-path, that is, the combination of neighbor nodes. For example, if node A has 10 neighbor nodes, then node A has 10 meta-paths.

S233、将时序特征输入至语义注意力理解层中,得到语义注意力理解层输出的节点对应的嵌入表示特征。S233. Input the time series features into the semantic attention understanding layer, and obtain the embedded representation features corresponding to the nodes output by the semantic attention understanding layer.

语义注意力理解层会先学习某个节点在不同元路径的权重并对节点的时序特征进行加权融合,得到节点的语义注意力权重,具体的为:The semantic attention understanding layer will first learn the weight of a node in different meta-paths and perform weighted fusion on the temporal features of the node to obtain the semantic attention weight of the node, specifically:

Figure BDA0003834792640000161
Figure BDA0003834792640000161

其中,

Figure BDA0003834792640000162
表示节点i的语义注意力权重;V为节点i的元路径总数也就是邻居节点的总数;qT为语义映射系数,用于将每个元路径的每个节点的语义映射到实数上来,在本实施例中,qT采用一个大小为1×3的向量参数;
Figure BDA0003834792640000163
为一单层的神经网络,W为该神经网络层可学习的参数矩阵,也为该神经网络层的层权重,b为该神经网络层的偏差向量。in,
Figure BDA0003834792640000162
Indicates the semantic attention weight of node i; V is the total number of meta-paths of node i, that is, the total number of neighbor nodes; q T is the semantic mapping coefficient, which is used to map the semantics of each node of each meta-path to a real number. In this embodiment, q T adopts a vector parameter whose size is 1×3;
Figure BDA0003834792640000163
is a single-layer neural network, W is the learnable parameter matrix of the neural network layer, and is also the layer weight of the neural network layer, and b is the bias vector of the neural network layer.

在本申请中,经过步骤S20的处理后,节点会更新为融合邻域特征,融合邻域特征的邻居节点也是原节点的邻居节点对应的融合邻域特征,融合邻域特征可以分别四种类型:用户A的用户信息、用户A的行为消息信息、非用户A(例如用户B)的用户信息以及非用户A(例如用户B)的行为消息信息,因此根据节点与其邻居节点之间的关系可以将时序特征分为三类:用户A的用户信息-同为该用户A的行为消息信息、用户A的用户信息-非该用户A的行为消息信息以及用户A的用户信息-非该用户A的用户信息。也就是根据节点与其邻居节点之间的关系,最终时序特征可以分为三类:

Figure BDA0003834792640000164
以及
Figure BDA0003834792640000165
对应的语义注意力权重也可以分为三类:
Figure BDA0003834792640000166
以及
Figure BDA0003834792640000167
In this application, after the processing in step S20, the node will be updated to the fusion neighborhood feature, and the neighbor node of the fusion neighborhood feature is also the fusion neighborhood feature corresponding to the neighbor node of the original node. The fusion neighborhood feature can be divided into four types: : User information of user A, behavior message information of user A, user information of non-user A (such as user B) and behavior message information of non-user A (such as user B), so according to the relationship between a node and its neighbor nodes, The timing features are divided into three categories: user information of user A - behavior information of the same user A, user information of user A - behavior information of non-user A, and user information of user A - information of non-user A User Info. That is, according to the relationship between a node and its neighbor nodes, the final timing features can be divided into three categories:
Figure BDA0003834792640000164
as well as
Figure BDA0003834792640000165
The corresponding semantic attention weights can also be divided into three categories:
Figure BDA0003834792640000166
as well as
Figure BDA0003834792640000167

之后,语义注意力理解层基于节点对应的时序特征以及语义注意力权重进行加权处理,确定节点的嵌入表示特征,嵌入表示特征可以将原图中的信息在低维的向量中得到保留,具体的为:Afterwards, the semantic attention understanding layer performs weighting processing based on the timing features corresponding to the nodes and the semantic attention weight, and determines the embedding representation features of the nodes. The embedding representation features can preserve the information in the original image in a low-dimensional vector. Specifically, for:

Figure BDA0003834792640000168
Figure BDA0003834792640000168

其中,Z表示所有节点的嵌入表示特征。where Z represents the embedding representation features of all nodes.

S234、将嵌入表示特征输入至预测输出模块中,得到预测输出模块输出的异常行为信息。S234. Input the embedded representation feature into the prediction output module, and obtain the abnormal behavior information output by the prediction output module.

预测输出模块会基于Z得到多标签的分类结果,根据分类结果以及对应的标签,得到是否存在异常行为并且属于哪种类型的异常行为,之后将生成的异常行为信息传输并反馈给用户。The prediction output module will obtain multi-label classification results based on Z. According to the classification results and corresponding labels, whether there is abnormal behavior and what type of abnormal behavior it belongs to is obtained, and then the generated abnormal behavior information is transmitted and fed back to the user.

下面结合图8描述本发明的基于行为关联挖掘的业务数据异常识别方法,业务数据识别模型通过以下步骤训练得到:The business data anomaly identification method based on behavior association mining of the present invention is described below in conjunction with FIG. 8, and the business data identification model is obtained through the following steps of training:

A10、从样本行为流图中确定样本节点的样本状态特征。样本状态特征的确定步骤与步骤S221类似,在此不做赘述。A10. Determine the sample state characteristics of the sample nodes from the sample behavior flow graph. The steps for determining the sample state characteristics are similar to step S221, and will not be repeated here.

A20、将样本行为流图作为训练使用的输入数据,将样本节点对应的样本状态特征作为训练使用的标签,采用深度学习的方式进行训练,得到用于生成用户的上网行为信息的异常行为信息的业务数据识别模型。A20. Use the sample behavior flow graph as the input data for training, use the sample state features corresponding to the sample nodes as the label for training, and use deep learning to train to obtain the abnormal behavior information used to generate the user's online behavior information. Business data identification model.

在本实施例中,步骤A20中采用半监督的学习方式训练模型,调整业务数据识别模型中的各项参数,在训练时训练集、验证集和测试集的比例为3:1:1,并且对于正常用户和异常用户按照上述比例随机进行选择,选用精准率(Precision)和召回率(Recall)作为评价标准,不断地进行前向传播和反向传播优化相应的参数。In this embodiment, in step A20, a semi-supervised learning method is used to train the model, and various parameters in the business data identification model are adjusted. During training, the ratio of the training set, the verification set and the test set is 3:1:1, and Normal users and abnormal users are randomly selected according to the above ratio, and the precision rate (Precision) and recall rate (Recall) are selected as the evaluation criteria, and the corresponding parameters are continuously optimized by forward propagation and back propagation.

下面对本发明提供的基于行为关联挖掘的业务数据异常识别装置进行描述,下文描述的基于行为关联挖掘的业务数据异常识别装置与上文描述的基于行为关联挖掘的业务数据异常识别方法可相互对应参照。The business data anomaly identification device based on behavioral correlation mining provided by the present invention is described below. The business data anomaly identification device based on behavioral correlation mining described below and the business data anomaly identification method based on behavioral correlation mining described above can be referred to in correspondence. .

在本申请中提供了一种基于行为关联挖掘的业务数据异常识别装置,可用于电子设备,如电脑、手机、可穿戴智能设备、平板电脑等,图9是根据本申请实施例的基于行为关联挖掘的业务数据异常识别装置的结构示意图,如图9所示,该装置包括:This application provides a business data anomaly identification device based on behavior association mining, which can be used in electronic devices, such as computers, mobile phones, wearable smart devices, tablet computers, etc. A schematic diagram of the structure of the mined business data anomaly identification device, as shown in Figure 9, the device includes:

特征提取模块10,用于从业务数据中确定用户的上网行为信息,从上网行为信息中提取用户特征和行为特征。The feature extraction module 10 is used to determine the user's online behavior information from the service data, and extract user features and behavior features from the online behavior information.

业务数据可以是事先存储在电子设备中,也可以是电子设备从外界获取到的。例如,可以是电子设备从各类流量监控设备中采集到的业务数据,包括但不局限于从网络层、传输层的流量数据中采集到业务数据。在此,对本申请如何获取到用户的上网行为信息并不做任何限制,只需保证其能够得到业务数据即可。The service data may be stored in the electronic device in advance, or obtained by the electronic device from the outside. For example, it may be business data collected by electronic devices from various traffic monitoring devices, including but not limited to business data collected from traffic data at the network layer and transport layer. Here, there is no restriction on how the application obtains the user's online behavior information, it only needs to ensure that the user can obtain the service data.

在本实施例中,用户特征用于表征用户上网操作的操作环境信息,用户特征包括用户设备、应用程序、使用的浏览器和网页等信息,行为特征用于表征用户上网操作的时间顺序信息,行为特征则包含了用户各项上网操作的时间顺序,即,行为特征包含了用户特征的时序信息,例如某用户习惯先点击某个功能图标之后再点击另一个功能图标等等。在本实施例中,业务数据中可以包含至少一位用户的上网行为信息,因此,从这些上网行为信息中也会提取到对应数量用户的用户特征和行为特征,例如业务数据中包含了用户A、B和C三位用户的上网行为信息,那么从上网行为信息中会分别提取到用户A对应的用户特征和行为特征、用户B对应的用户特征和行为特征以及用户C对应的用户特征和行为特征。In this embodiment, the user characteristics are used to represent the operating environment information of the user's online operation, the user characteristics include information such as user equipment, application programs, browsers used, and web pages, and the behavioral characteristics are used to represent the chronological information of the user's online operations. Behavior features include the chronological order of the user's various online operations, that is, the behavior features include the timing information of user characteristics, for example, a user is accustomed to clicking a certain function icon first and then clicking another function icon, and so on. In this embodiment, the business data may contain at least one user's online behavior information, therefore, the user characteristics and behavior characteristics of a corresponding number of users will also be extracted from the online behavior information, for example, the business data includes user A , B and C three users’ online behavior information, then from the online behavior information, the user characteristics and behavior characteristics corresponding to user A, the user characteristics and behavior characteristics corresponding to user B, and the user characteristics and behavior corresponding to user C will be extracted respectively. feature.

行为识别模块20,用于将用户特征和行为特征输入至训练好的业务数据识别模型中,得到由业务数据识别模型输出的异常行为信息,在本申请中,业务数据识别模型是基于用户的样本行为流图训练得到的,样本行为流图是基于用户的样本用户特征和样本行为特征构建的,业务数据识别模型用于基于用户特征和行为特征构建行为流图,确定行为流图中各个节点对应的融合邻域特征以及基于节点的融合领域特征确定节点对应的嵌入表示特征,并基于从嵌入表示特征中确定得到的分类结果,对业务数据进行异常行为预测。在本实施例中,行为流图中各个节点包含了用户特征和行为特征,由于行为特征中包含了相关的时序信息,节点之间会基于行为特征进行连线,而连线则用于表示节点之间的时序关系。The behavior recognition module 20 is used to input user characteristics and behavior characteristics into the trained business data recognition model to obtain abnormal behavior information output by the business data recognition model. In this application, the business data recognition model is based on user samples Behavior flow diagram training, the sample behavior flow diagram is constructed based on the user's sample user characteristics and sample behavior characteristics, the business data recognition model is used to construct the behavior flow diagram based on user characteristics and behavior characteristics, and determine the corresponding nodes in the behavior flow diagram Based on the fused neighborhood features and node-based fused domain features, the embedded representation features corresponding to the nodes are determined, and based on the classification results determined from the embedded representation features, the abnormal behavior of the business data is predicted. In this embodiment, each node in the behavior flow graph contains user characteristics and behavior characteristics. Since the behavior characteristics contain relevant timing information, the nodes will be connected based on the behavior characteristics, and the connections are used to represent the nodes. timing relationship between them.

嵌入表示特征就是节点在嵌入空间的表示,即,对行为流图中各个节点进行编码,使得节点在嵌入空间的相似度近似节点在原图中的相似度。在本申请中,会先基于提取到的用户特征和行为特征构建用户的行为流图,之后会基于每个用户特征(行为流图的每个节点)、节点的邻居节点、邻居节点的权重得到节点对应的融合领域特征,再基于节点的融合邻域特征以及节点的邻居节点、邻居节点的权重确定嵌入表征特征,并基于从嵌入表示特征确定得到的分类结果,对业务数据进行异常行为预测。具体如何得到节点对应的嵌入表示特征,将在下文进行具体阐述。The embedding representation feature is the representation of nodes in the embedding space, that is, each node in the behavior flow graph is encoded, so that the similarity of the nodes in the embedding space is similar to the similarity of the nodes in the original graph. In this application, the user's behavior flow graph will be constructed first based on the extracted user characteristics and behavior characteristics, and then based on each user characteristic (each node in the behavior flow graph), the neighbor nodes of the node, and the weight of the neighbor nodes to obtain Based on the fused domain features corresponding to the node, the embedded representation features are determined based on the fused neighborhood features of the node, the neighbor nodes of the node, and the weights of the neighbor nodes, and based on the classification results determined from the embedded representation features, the abnormal behavior of the business data is predicted. Specifically, how to obtain the embedded representation features corresponding to the nodes will be described in detail below.

其中,行为流图具有若干节点,节点为用户的相关信息,包含了用户特征和行为特征,用于表示用户的行为静态信息,节点之间基于行为特征进行连线,并且连线用于表示节点之间的时序关系,与节点A存在连线的节点即为节点A的邻居节点。类似的,样本行为流图具有若干样本节点,样本节点同样为用户的相关信息,包含了用户特征和用户行为特征,用于表示用户的行为静态信息,样本节点之间基于样本行为特征进行连线,并且连线用于表示样本节点之间的时序关系,与样本节点C存在连线的节点即为样本节点C的邻居节点。Among them, the behavior flow graph has several nodes, the nodes are related information of the user, including user characteristics and behavior characteristics, and are used to represent the static information of the user's behavior, and the nodes are connected based on the behavior characteristics, and the connection is used to represent the nodes The timing relationship between nodes, the node connected to node A is the neighbor node of node A. Similarly, the sample behavior flow graph has several sample nodes. The sample nodes are also user-related information, including user characteristics and user behavior characteristics, and are used to represent the static information of user behavior. The connection between sample nodes is based on the sample behavior characteristics. , and the connection line is used to represent the timing relationship between the sample nodes, and the node connected to the sample node C is the neighbor node of the sample node C.

可以理解的是,样本用户特征和样本行为特征是从该用户的样本上网行为信息中提取到的,样本上网行为信息是从历史业务数据中提取得到的。在此,对本申请如何获取到用户的历史业务数据并不做任何限制,只需保证其能够得到历史业务数据即可。It can be understood that the sample user characteristics and sample behavior characteristics are extracted from the user's sample online behavior information, and the sample online behavior information is extracted from historical service data. Here, there is no restriction on how the application obtains the user's historical service data, and it is only necessary to ensure that the user can obtain the historical service data.

本发明的基于行为关联挖掘的业务数据异常识别装置,从用户的上网行为信息中提取用户特征和行为特征,提取到的信息中不仅保留了用户上网操作的操作环境信息,还保留了用户上网操作的时间顺序信息,并利用训练好的业务数据识别模型对用户特征和行为特征进行识别,识别出其中的异常行为信息,由于业务数据识别模型用于基于行为特征与其邻居节点、邻居节点对应的权重得到用户特征对应的嵌入表示特征并基于嵌入表示特征与用户特征之间的距离,对用户特征进行异常行为预测,因此业务数据识别模型融合了用户行为、行为产生的内容、用户行为之间的时序关系这三个方面的特征对业务数据异常进行识别,相对应的,异常行为信息在精准度、识别速度方面有着明显的提升,本申请通过该业务数据识别模型进行基于行为关联挖掘的业务数据异常识别,能够对异常行为迅速地做出裁决和响应,以此准确实现用户行为管理限制等,能够满足大数据时代的发展要求。The business data anomaly identification device based on behavior association mining of the present invention extracts user characteristics and behavior characteristics from the user's online behavior information, and the extracted information not only retains the operating environment information of the user's online operation, but also retains the user's online operation chronological information, and use the trained business data recognition model to identify user characteristics and behavior characteristics, and identify the abnormal behavior information, because the business data recognition model is used to weight Obtain the embedded representation features corresponding to the user features and predict the abnormal behavior of the user features based on the distance between the embedded representation features and the user features. Therefore, the business data recognition model integrates the user behavior, the content generated by the behavior, and the time sequence between user behaviors The characteristics of these three aspects of the relationship identify business data anomalies. Correspondingly, abnormal behavior information has significantly improved in terms of accuracy and recognition speed. This application uses this business data identification model to conduct business data anomalies based on behavior association mining. Identify and quickly make judgments and respond to abnormal behaviors, so as to accurately realize user behavior management restrictions, etc., and can meet the development requirements of the big data era.

图10示例了一种电子设备的实体结构示意图,如图10所示,该电子设备可以包括:处理器(processor)310、通信接口(Communications Interface)320、存储器(memory)330和通信总线340,其中,处理器310,通信接口320,存储器330通过通信总线340完成相互间的通信。处理器310可以调用存储器330中的逻辑命令,以执行基于行为关联挖掘的业务数据异常识别方法,该方法包括:FIG. 10 illustrates a schematic diagram of the physical structure of an electronic device. As shown in FIG. 10 , the electronic device may include: a processor (processor) 310, a communication interface (Communications Interface) 320, a memory (memory) 330 and a communication bus 340, Wherein, the processor 310 , the communication interface 320 , and the memory 330 communicate with each other through the communication bus 340 . The processor 310 can invoke logic commands in the memory 330 to execute a method for identifying business data anomalies based on behavior association mining, the method including:

确定用户的上网行为信息,从上网行为信息中提取用户特征和行为特征;用户特征用于表征用户上网操作的操作环境信息,行为特征用于表征用户上网操作的时间顺序信息;Determine the user's online behavior information, extract user characteristics and behavior characteristics from the online behavior information; user characteristics are used to represent the operating environment information of the user's online operation, and the behavioral characteristics are used to represent the time sequence information of the user's online operation;

将用户特征和行为特征输入至业务数据识别模型中,得到由业务数据识别模型输出的异常行为信息;业务数据识别模型是基于用户的样本行为流图训练得到的;样本行为流图是基于用户的样本用户特征和样本行为特征构建的;Input user characteristics and behavior characteristics into the business data recognition model to obtain abnormal behavior information output by the business data recognition model; the business data recognition model is trained based on the user's sample behavior flow graph; the sample behavior flow graph is based on the user's Constructed from sample user characteristics and sample behavior characteristics;

业务数据识别模型用于基于用户特征和行为特征构建行为流图,确定行为流图中各个节点对应的融合邻域特征以及基于节点的融合领域特征确定节点对应的嵌入表示特征,并基于从嵌入表示特征中确定得到的分类结果,对业务数据进行异常行为预测;节点包含了用户特征和行为特征,且节点之间基于行为特征进行连线,连线用于表示节点之间的时序关系。The business data identification model is used to construct a behavior flow graph based on user characteristics and behavior characteristics, determine the fusion neighborhood characteristics corresponding to each node in the behavior flow diagram, and determine the embedded representation characteristics corresponding to nodes based on the fusion field characteristics of nodes, and based on the embedded representation The classification result determined in the feature is used to predict the abnormal behavior of the business data; the nodes contain user characteristics and behavior characteristics, and the nodes are connected based on the behavior characteristics, and the connection is used to represent the timing relationship between the nodes.

此外,上述的存储器330中的逻辑命令可以通过软件功能单元的形式实现并作为独立的介质销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件介质的形式体现出来,该计算机软件介质存储在一个存储介质中,包括若干命令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above-mentioned logic commands in the memory 330 can be implemented in the form of software function units and can be stored in a computer-readable storage medium when sold or used as an independent medium. Based on this understanding, the essence of the technical solution of the present invention or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software medium, and the computer software medium is stored in a storage medium, including Several commands are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes. .

另一方面,本发明还提供一种计算机程序介质,所述计算机程序介质包括计算机程序,计算机程序可存储在非暂态计算机可读存储介质上,所述计算机程序被处理器执行时,计算机能够执行上述各方法所提供的基于行为关联挖掘的业务数据异常识别方法,该方法包括:On the other hand, the present invention also provides a computer program medium, the computer program medium includes a computer program, the computer program can be stored on a non-transitory computer-readable storage medium, and when the computer program is executed by a processor, the computer can Executing the business data anomaly identification method based on behavior association mining provided by the above methods, the method includes:

确定用户的上网行为信息,从上网行为信息中提取用户特征和行为特征;用户特征用于表征用户上网操作的操作环境信息,行为特征用于表征用户上网操作的时间顺序信息;Determine the user's online behavior information, extract user characteristics and behavior characteristics from the online behavior information; user characteristics are used to represent the operating environment information of the user's online operation, and the behavioral characteristics are used to represent the time sequence information of the user's online operation;

将用户特征和行为特征输入至业务数据识别模型中,得到由业务数据识别模型输出的异常行为信息;业务数据识别模型是基于用户的样本行为流图训练得到的;样本行为流图是基于用户的样本用户特征和样本行为特征构建的;Input user characteristics and behavior characteristics into the business data recognition model to obtain abnormal behavior information output by the business data recognition model; the business data recognition model is trained based on the user's sample behavior flow graph; the sample behavior flow graph is based on the user's Constructed from sample user characteristics and sample behavior characteristics;

业务数据识别模型用于基于用户特征和行为特征构建行为流图,确定行为流图中各个节点对应的融合邻域特征以及基于节点的融合领域特征确定节点对应的嵌入表示特征,并基于从嵌入表示特征中确定得到的分类结果,对业务数据进行异常行为预测;节点包含了用户特征和行为特征,且节点之间基于行为特征进行连线,连线用于表示节点之间的时序关系。The business data identification model is used to construct a behavior flow graph based on user characteristics and behavior characteristics, determine the fusion neighborhood characteristics corresponding to each node in the behavior flow diagram, and determine the embedded representation characteristics corresponding to nodes based on the fusion field characteristics of nodes, and based on the embedded representation The classification result determined in the feature is used to predict the abnormal behavior of the business data; the nodes contain user characteristics and behavior characteristics, and the nodes are connected based on the behavior characteristics, and the connection is used to represent the timing relationship between the nodes.

以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本邻域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network elements. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without any creative efforts.

通过以上的实施方式的描述,本邻域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件介质的形式体现出来,该计算机软件介质可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干命令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware. Based on this understanding, the essence of the above technical solutions or the part that contributes to the prior art can be embodied in the form of software media, and the computer software media can be stored in computer-readable storage media, such as ROM/RAM, magnetic Disc, CD, etc., including several commands to make a computer device (which may be a personal computer, server, or network device, etc.) execute the methods described in various embodiments or some parts of the embodiments.

最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本邻域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in this area should understand that: it still The technical solutions recorded in the foregoing embodiments may be modified, or some of the technical features may be equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the present invention. .

Claims (14)

1.一种基于行为关联挖掘的业务数据异常识别方法,其特征在于,所述方法包括:1. A business data anomaly identification method based on behavior association mining, is characterized in that, described method comprises: 从业务数据中确定用户的上网行为信息,从上网行为信息中提取用户特征和行为特征;用户特征用于表征用户上网操作的操作环境信息,行为特征用于表征用户上网操作的时间顺序信息;Determine the user's online behavior information from the business data, extract user characteristics and behavior characteristics from the online behavior information; user characteristics are used to represent the operating environment information of the user's online operation, and the behavioral characteristics are used to represent the time sequence information of the user's online operation; 将用户特征和行为特征输入至业务数据识别模型中,得到由业务数据识别模型输出的异常行为信息;业务数据识别模型是基于用户的样本行为流图训练得到的;样本行为流图是基于用户的样本用户特征和样本行为特征构建的;Input user characteristics and behavior characteristics into the business data recognition model to obtain abnormal behavior information output by the business data recognition model; the business data recognition model is trained based on the user's sample behavior flow graph; the sample behavior flow graph is based on the user's Constructed from sample user characteristics and sample behavior characteristics; 业务数据识别模型用于基于用户特征和行为特征构建行为流图,确定行为流图中各个节点对应的融合邻域特征以及基于节点的融合领域特征确定节点对应的嵌入表示特征,并基于从嵌入表示特征中确定得到的分类结果,对业务数据进行异常行为预测;节点包含了用户特征和行为特征,且节点之间基于行为特征进行连线,连线用于表示节点之间的时序关系。The business data identification model is used to construct a behavior flow graph based on user characteristics and behavior characteristics, determine the fusion neighborhood characteristics corresponding to each node in the behavior flow diagram, and determine the embedded representation characteristics corresponding to nodes based on the fusion field characteristics of nodes, and based on the embedded representation The classification result determined in the feature is used to predict the abnormal behavior of the business data; the nodes contain user characteristics and behavior characteristics, and the nodes are connected based on the behavior characteristics, and the connection is used to represent the timing relationship between the nodes. 2.根据权利要求1所述的基于行为关联挖掘的业务数据异常识别方法,其特征在于,所述业务数据识别模型包括流图构建层、同质用户关系网络层和异质用户消息网络层;2. The business data anomaly identification method based on behavior association mining according to claim 1, wherein the business data identification model comprises a flow graph construction layer, a homogeneous user relationship network layer and a heterogeneous user message network layer; 流图构建层用于基于用户特征和行为特征,构建用户的行为流图;The flow graph construction layer is used to build a user's behavior flow graph based on user characteristics and behavioral characteristics; 同质用户关系网络层用于基于节点以及节点的邻居节点,对行为流图中各个节点进行注意力运算,确定节点的融合邻域特征;邻居节点为与当前节点存在连线的节点;The homogeneous user relationship network layer is used to perform attention operations on each node in the behavior flow graph based on the node and its neighbor nodes, and determine the fusion neighborhood characteristics of the node; the neighbor node is a node that has a connection with the current node; 异质用户消息网络层用于节点的融合邻域特征以及邻居节点的融合邻域特征进行注意力运算以及语义注意力理解,确定节点的嵌入表示特征,基于嵌入表示特征得到分类结果,并基于分类结果确定异常行为信息。The heterogeneous user message network layer is used for the fusion neighborhood feature of the node and the fusion neighborhood feature of the neighbor node to perform attention operation and semantic attention understanding, determine the embedded representation feature of the node, obtain the classification result based on the embedded representation feature, and obtain the classification result based on the classification The results identify abnormal behavior information. 3.根据权利要求2所述的基于行为关联挖掘的业务数据异常识别方法,其特征在于,所述将用户特征和行为特征输入至业务数据识别模型中,得到由业务数据识别模型输出的异常行为信息,具体包括:3. The business data anomaly identification method based on behavior association mining according to claim 2, wherein the user characteristics and behavior characteristics are input into the business data identification model to obtain the abnormal behavior output by the business data identification model information, including: 将用户特征和行为特征输入至流图构建层中,得到流图构建层输出的行为流图;Input user characteristics and behavior characteristics into the flow graph construction layer to obtain the behavior flow graph output by the flow graph construction layer; 将行为流图输入至同质用户关系网络层中,得到同质用户关系网络层输出的节点对应的融合邻域特征;Input the behavior flow graph into the homogeneous user relationship network layer, and obtain the fusion neighborhood features corresponding to the nodes output by the homogeneous user relationship network layer; 将融合邻域特征输入至异质用户消息网络层中,得到异质用户消息网络层输出的异常行为信息;异常行为信息包括节点以及节点对应的状态特征。The fusion neighborhood feature is input into the heterogeneous user information network layer, and the abnormal behavior information output by the heterogeneous user information network layer is obtained; the abnormal behavior information includes the node and the state feature corresponding to the node. 4.根据权利要求3所述的基于行为关联挖掘的业务数据异常识别方法,其特征在于,所述同质用户关系网络层包括邻域卷积层、相似度确定层、归一化处理层和第一注意力运算层;4. The business data anomaly identification method based on behavior association mining according to claim 3, wherein the homogeneous user relationship network layer comprises a neighborhood convolution layer, a similarity determination layer, a normalization processing layer and The first attention operation layer; 邻域卷积层用于对节点、节点的邻居节点以及节点对应的卷积权重进行卷积运算,确定节点的状态特征;状态特征用于表征节点的标签;The neighborhood convolution layer is used to perform convolution operations on nodes, neighbor nodes of nodes, and convolution weights corresponding to nodes to determine the state characteristics of nodes; the state characteristics are used to represent the labels of nodes; 相似度确定层用于确定邻居节点对应的状态特征与节点对应的状态特征之间相似系数;The similarity determination layer is used to determine the similarity coefficient between the state feature corresponding to the neighbor node and the state feature corresponding to the node; 归一化处理层用于对所述相似系数进行归一化处理,确定领域节点与节点之间的注意力系数;The normalization processing layer is used to perform normalization processing on the similarity coefficient, and determine the attention coefficient between domain nodes and nodes; 第一注意力运算层用于基于邻居节点对应的状态特征、拼接权重以及注意力系数进行加权处理,确定节点对应的融合邻域特征。The first attention operation layer is used to perform weighting processing based on the state characteristics, splicing weights and attention coefficients corresponding to the neighbor nodes, and determine the fusion neighborhood characteristics corresponding to the nodes. 5.根据权利要求3所述的基于行为关联挖掘的业务数据异常识别方法,其特征在于,所述异质用户消息网络层包括权重学习层、第二注意力运算层、语义注意力理解层和预测输出层;5. the business data anomaly identification method based on behavior association mining according to claim 3, is characterized in that, described heterogeneous user message network layer comprises weight learning layer, the second attention calculation layer, semantic attention understanding layer and predict output layer; 权重学习层用于基于自注意力机制,确定节点的注意力权重;The weight learning layer is used to determine the attention weight of the node based on the self-attention mechanism; 第二注意力运算层用于基于邻居节点对应的融合邻域特征以及注意力权重进行加权处理,确定节点的时序特征;时序特征用于表征节点的语义;The second attention operation layer is used to perform weighted processing based on the fusion neighborhood features and attention weights corresponding to the neighbor nodes to determine the timing characteristics of the nodes; the timing characteristics are used to represent the semantics of the nodes; 语义注意力理解层用于对节点对应的时序特征进行映射,确定节点的语义注意力权重,并基于节点对应的时序特征以及语义注意力权重进行加权处理,确定节点的嵌入表示特征;The semantic attention understanding layer is used to map the timing features corresponding to the nodes, determine the semantic attention weight of the nodes, and perform weighting processing based on the timing features corresponding to the nodes and the semantic attention weights, and determine the embedding representation characteristics of the nodes; 预测输出模块用于对确定嵌入表示特征的分类结果,并基于分类结果输出异常行为信息。The prediction output module is used to determine the classification result of the embedded representation feature, and output abnormal behavior information based on the classification result. 6.根据权利要求4所述的基于行为关联挖掘的业务数据异常识别方法,其特征在于,所述将行为流图输入至同质用户关系网络层中,得到同质用户关系网络层输出的节点对应的融合邻域特征,具体包括:6. The business data anomaly identification method based on behavior association mining according to claim 4, wherein the behavior flow graph is input into the homogeneous user relationship network layer to obtain the nodes output by the homogeneous user relationship network layer The corresponding fusion neighborhood features include: 将行为流图输入至邻域卷积层中,得到邻域卷积层输出的节点对应的状态特征;Input the behavior flow graph into the neighborhood convolution layer to obtain the state characteristics corresponding to the nodes output by the neighborhood convolution layer; 将节点特征输入至相似度确定层中,得到相似度确定层输出的节点之间的相似系数;The node features are input into the similarity determination layer to obtain the similarity coefficient between the nodes output by the similarity determination layer; 将相似系数输入至归一化处理层中,得到归一化处理层输出的节点之间的注意力系数;Input the similarity coefficient into the normalization processing layer to obtain the attention coefficient between the nodes output by the normalization processing layer; 将节点的邻居节点对应的状态特征、拼接权重以及注意力系数输入至第一注意力运算层中,得到第一注意力运算层输出的节点对应的融合邻域特征。The state features, splicing weights, and attention coefficients corresponding to the neighbor nodes of the node are input into the first attention operation layer, and the fusion neighborhood features corresponding to the nodes output by the first attention operation layer are obtained. 7.根据权利要求5所述的基于行为关联挖掘的业务数据异常识别方法,其特征在于,所述将融合邻域特征输入至异质用户消息网络层中,得到异质用户消息网络层输出的异常行为信息,具体包括:7. The business data anomaly identification method based on behavior association mining according to claim 5, wherein the fusion neighborhood feature is input into the heterogeneous user message network layer to obtain the output of the heterogeneous user message network layer Abnormal behavior information, including: 将融合邻域特征输入至权重学习层中,得到权重学习层输出的节点对应的注意力权重;Input the fusion neighborhood feature into the weight learning layer to obtain the attention weight corresponding to the node output by the weight learning layer; 将节点的融合邻域特征以及注意力权重输入至第二注意力运算层中,得到第二注意力运算层输出的节点对应的时序特征;Input the fusion neighborhood feature and attention weight of the node into the second attention operation layer, and obtain the timing characteristics corresponding to the nodes output by the second attention operation layer; 将时序特征输入至语义注意力理解层中,得到语义注意力理解层输出的节点对应的嵌入表示特征;Input the timing features into the semantic attention understanding layer, and obtain the embedded representation features corresponding to the nodes output by the semantic attention understanding layer; 将嵌入表示特征输入至预测输出模块中,得到预测输出模块输出的异常行为信息。The embedded representation feature is input into the prediction output module, and the abnormal behavior information output by the prediction output module is obtained. 8.根据权利要求1所述的基于行为关联挖掘的业务数据异常识别方法,其特征在于,所述业务数据识别模型通过以下步骤训练得到:8. The business data anomaly identification method based on behavior association mining according to claim 1, wherein the business data identification model is trained through the following steps: 从样本行为流图中确定样本节点的样本状态特征;样本行为流图中各个样本节点包含了样本用户特征和样本行为特征,样本节点之间基于样本行为特征进行连线,连线用于表示样本节点之间的时序关系;Determine the sample state characteristics of the sample nodes from the sample behavior flow graph; each sample node in the sample behavior flow graph contains sample user characteristics and sample behavior characteristics, and connects the sample nodes based on the sample behavior characteristics, and the connection is used to represent the sample Timing relationship between nodes; 将样本行为流图作为训练使用的输入数据,将样本节点对应的样本状态特征作为训练使用的标签,采用深度学习的方式进行训练,得到用于生成用户的上网行为信息的异常行为信息的业务数据识别模型。The sample behavior flow graph is used as the input data for training, and the sample state characteristics corresponding to the sample nodes are used as the labels for training, and the deep learning method is used for training to obtain the business data used to generate the user's online behavior information and abnormal behavior information Identify the model. 9.一种基于行为关联挖掘的业务数据异常识别装置,其特征在于,所述装置包括:9. A business data anomaly identification device based on behavior association mining, characterized in that the device comprises: 特征提取模块,用于确定用户的上网行为信息,从上网行为信息中提取用户特征和行为特征;用户特征用于表征用户上网操作的操作环境信息,行为特征用于表征用户上网操作的时间顺序信息;The feature extraction module is used to determine the user's online behavior information, and extract user characteristics and behavior characteristics from the online behavior information; user characteristics are used to represent the operating environment information of the user's online operations, and the behavioral characteristics are used to represent the chronological information of the user's online operations ; 行为识别模块,用于将用户特征和行为特征输入至业务数据识别模型中,得到由业务数据识别模型输出的异常行为信息;业务数据识别模型是基于用户的样本行为流图训练得到的;样本行为流图是基于用户的样本用户特征和样本行为特征构建的;The behavior recognition module is used to input user characteristics and behavior characteristics into the business data recognition model to obtain abnormal behavior information output by the business data recognition model; the business data recognition model is trained based on the user's sample behavior flow graph; the sample behavior The flow graph is constructed based on the user's sample user characteristics and sample behavior characteristics; 业务数据识别模型用于基于用户特征和行为特征构建行为流图,确定行为流图中各个节点对应的融合邻域特征以及基于节点的融合领域特征确定节点对应的嵌入表示特征,并基于从嵌入表示特征中确定得到的分类结果,对业务数据进行异常行为预测;节点包含了用户特征和行为特征,且节点之间基于行为特征进行连线,连线用于表示节点之间的时序关系。The business data identification model is used to construct a behavior flow graph based on user characteristics and behavior characteristics, determine the fusion neighborhood characteristics corresponding to each node in the behavior flow diagram, and determine the embedded representation characteristics corresponding to nodes based on the fusion field characteristics of nodes, and based on the embedded representation The classification result determined in the feature is used to predict the abnormal behavior of the business data; the nodes contain user characteristics and behavior characteristics, and the nodes are connected based on the behavior characteristics, and the connection is used to represent the timing relationship between the nodes. 10.根据权利要求9所述的基于行为关联挖掘的业务数据异常识别装置,其特征在于,所述行为识别模块具体包括:10. The business data anomaly identification device based on behavior association mining according to claim 9, wherein the behavior identification module specifically includes: 流图构建单元,用于将用户特征和行为特征输入至流图构建层中,得到流图构建层输出的行为流图;The flow graph construction unit is used to input user characteristics and behavior characteristics into the flow graph construction layer to obtain the behavior flow graph output by the flow graph construction layer; 关系识别单元,用于将行为流图输入至同质用户关系网络层中,得到同质用户关系网络层输出的节点对应的融合邻域特征;The relationship recognition unit is used to input the behavior flow graph into the homogeneous user relationship network layer, and obtain the fusion neighborhood features corresponding to the nodes output by the homogeneous user relationship network layer; 消息识别单元,用于将融合邻域特征输入至异质用户消息网络层中,得到异质用户消息网络层输出的异常行为信息;异常行为信息包括节点以及节点对应的状态特征。The message recognition unit is used to input the fusion neighborhood feature into the heterogeneous user message network layer to obtain the abnormal behavior information output by the heterogeneous user message network layer; the abnormal behavior information includes nodes and state features corresponding to the nodes. 11.根据权利要求10所述的基于行为关联挖掘的业务数据异常识别装置,其特征在于,所述关系识别单元具体包括:11. The business data anomaly identification device based on behavior association mining according to claim 10, wherein the relationship identification unit specifically comprises: 第一识别单元,用于将行为流图输入至邻域卷积层中,得到邻域卷积层输出的节点对应的状态特征;The first identification unit is used to input the behavior flow graph into the neighborhood convolution layer, and obtain the state characteristics corresponding to the nodes output by the neighborhood convolution layer; 第二识别单元,用于将节点特征输入至相似度确定层中,得到相似度确定层输出的节点之间的相似系数;The second identification unit is used to input the node features into the similarity determination layer to obtain the similarity coefficient between the nodes output by the similarity determination layer; 第三识别单元,用于将相似系数输入至归一化处理层中,得到归一化处理层输出的节点之间的注意力系数;The third identification unit is used to input the similarity coefficient into the normalization processing layer, and obtain the attention coefficient between the nodes output by the normalization processing layer; 第四识别单元,用于将节点的邻居节点对应的状态特征、拼接权重以及注意力系数输入至第一注意力运算层中,得到第一注意力运算层输出的节点对应的融合邻域特征。The fourth identification unit is used to input the state features, splicing weights and attention coefficients corresponding to the neighbor nodes of the node into the first attention operation layer, and obtain the fused neighborhood features corresponding to the nodes output by the first attention operation layer. 12.根据权利要求10所述的基于行为关联挖掘的业务数据异常识别装置,其特征在于,所述消息识别单元具体包括:12. The business data anomaly identification device based on behavior association mining according to claim 10, wherein the message identification unit specifically comprises: 第五识别单元,用于将融合邻域特征输入至权重学习层中,得到权重学习层输出的节点对应的注意力权重;The fifth identification unit is used to input the fusion neighborhood feature into the weight learning layer to obtain the attention weight corresponding to the node output by the weight learning layer; 第六识别单元,用于权重学习层用于基于自注意力机制,确定节点的注意力权重;The sixth identification unit is used in the weight learning layer to determine the attention weight of the node based on the self-attention mechanism; 第七识别单元,用于将节点的融合邻域特征以及注意力权重输入至第二注意力运算层中,得到第二注意力运算层输出的节点在元路径下的时序特征;The seventh identification unit is used to input the fusion neighborhood feature and attention weight of the node into the second attention operation layer, and obtain the timing characteristics of the nodes output by the second attention operation layer under the meta-path; 第八识别单元,用于将时序特征输入至语义注意力理解层中,得到语义注意力理解层输出的节点对应的嵌入表示特征;The eighth recognition unit is used to input the time series feature into the semantic attention understanding layer, and obtain the embedded representation feature corresponding to the node output by the semantic attention understanding layer; 第九识别单元,用于将嵌入表示特征输入至预测输出模块中,得到预测输出模块输出的异常行为信息。The ninth identification unit is configured to input the embedded representation feature into the predictive output module to obtain the abnormal behavior information output by the predictive output module. 13.一种电子设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求1至8任一项所述基于行为关联挖掘的业务数据异常识别方法的步骤。13. An electronic device comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processor according to claim 1 is implemented when executing the program. The steps of the business data anomaly identification method based on behavior association mining described in any one of 8 to 8. 14.一种非暂态计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至8任一项所述基于行为关联挖掘的业务数据异常识别方法的步骤。14. A non-transitory computer-readable storage medium, on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the behavior-based association mining as described in any one of claims 1 to 8 is implemented. The steps of the method for identifying abnormal business data.
CN202211084180.7A 2022-09-06 2022-09-06 Business data anomaly identification method and device based on behavior association mining Active CN115473718B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211084180.7A CN115473718B (en) 2022-09-06 2022-09-06 Business data anomaly identification method and device based on behavior association mining

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211084180.7A CN115473718B (en) 2022-09-06 2022-09-06 Business data anomaly identification method and device based on behavior association mining

Publications (2)

Publication Number Publication Date
CN115473718A true CN115473718A (en) 2022-12-13
CN115473718B CN115473718B (en) 2024-08-09

Family

ID=84371480

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211084180.7A Active CN115473718B (en) 2022-09-06 2022-09-06 Business data anomaly identification method and device based on behavior association mining

Country Status (1)

Country Link
CN (1) CN115473718B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116089887A (en) * 2022-12-14 2023-05-09 国网福建省电力有限公司 A method for detecting and repairing abnormal data of power system equipment ledger
CN116760727A (en) * 2023-05-30 2023-09-15 南京南瑞信息通信科技有限公司 An abnormal traffic identification method, device, system and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030200462A1 (en) * 1999-05-11 2003-10-23 Software Systems International Llc Method and system for establishing normal software system behavior and departures from normal behavior
US20180103052A1 (en) * 2016-10-11 2018-04-12 Battelle Memorial Institute System and methods for automated detection, reasoning and recommendations for resilient cyber systems
CN113094707A (en) * 2021-03-31 2021-07-09 中国科学院信息工程研究所 Transverse mobile attack detection method and system based on heterogeneous graph network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030200462A1 (en) * 1999-05-11 2003-10-23 Software Systems International Llc Method and system for establishing normal software system behavior and departures from normal behavior
US20180103052A1 (en) * 2016-10-11 2018-04-12 Battelle Memorial Institute System and methods for automated detection, reasoning and recommendations for resilient cyber systems
CN113094707A (en) * 2021-03-31 2021-07-09 中国科学院信息工程研究所 Transverse mobile attack detection method and system based on heterogeneous graph network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
季述郧: "基于图卷积的电信用户行为识别方法研究与仿真", 《中国优秀硕士学位论文全文数据库(电子期刊) 信息科技辑》, no. 1, 15 January 2022 (2022-01-15) *
易树平;李嘉佳;易茜: "基于行为流图的可信交互检测方法", 《控制与决策》, vol. 35, no. 11, 14 May 2019 (2019-05-14), pages 2715 - 2722 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116089887A (en) * 2022-12-14 2023-05-09 国网福建省电力有限公司 A method for detecting and repairing abnormal data of power system equipment ledger
CN116760727A (en) * 2023-05-30 2023-09-15 南京南瑞信息通信科技有限公司 An abnormal traffic identification method, device, system and storage medium

Also Published As

Publication number Publication date
CN115473718B (en) 2024-08-09

Similar Documents

Publication Publication Date Title
Liang et al. Survey of graph neural networks and applications
Omar et al. Quantum computing and machine learning for Arabic language sentiment classification in social media
Gao Network intrusion detection method combining CNN and BiLSTM in cloud computing environment
CN113868497A (en) Data classification method and device and storage medium
CN118797077B (en) A security knowledge generation method and system based on large language model
CN114330966A (en) Risk prediction method, device, equipment and readable storage medium
CN115473718A (en) Business data anomaly identification method and device based on behavior association mining
Zhu et al. CCBLA: a lightweight phishing detection model based on CNN, BiLSTM, and attention mechanism
CN116090504A (en) Training method and device for graphic neural network model, classifying method and computing equipment
CN112819024B (en) Model processing method, user data processing method and device and computer equipment
CN118113503A (en) Intelligent operation and maintenance system fault prediction method, device, equipment and storage medium
CN119719670B (en) Distribution network data asset vulnerability identification method, device, system, and storage medium
Wang et al. AutoLDT: a lightweight spatio-temporal decoupling transformer framework with AutoML method for time series classification
Zhang et al. Network security situation assessment based on BKA and cross dual-channel
Shang et al. A hybrid model for missing traffic flow data imputation based on clustering and attention mechanism optimizing LSTM and AdaBoost
Abbas A survey of research into artificial neural networks for crime prediction
Hao et al. Deep collaborative online learning resource recommendation based on attention mechanism
Zhang Mathematical modeling of multiscale network traffic combination prediction based on fuzzy support vector machine
CN113761357B (en) Information recommendation method and device, electronic equipment and storage medium
Wang et al. Research on image segmentation algorithm based on multimodal hierarchical attention mechanism and genetic neural network
Doan et al. Pacela: A neural framework for user visitation in location-based social networks
Peng et al. CLGSDN: Contrastive Learning Based Graph Structure Denoising Network for Traffic Prediction
CN120474825B (en) Node access control method, device, computer equipment and storage medium
CN114723509B (en) Object state identification method and device, electronic equipment and storage medium
CN116996400A (en) User service quality prediction method, system and equipment based on privacy protection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载