CN114187112B

CN114187112B - Training method of account risk model and determining method of risk user group

Info

Publication number: CN114187112B
Application number: CN202111534628.6A
Authority: CN
Inventors: 曹世鸿; 肖和兵
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2025-04-08
Anticipated expiration: 2041-12-15
Also published as: CN114187112A; WO2023109085A1

Abstract

The present application provides a method for training an account risk model and a method for determining a risk user group. The method for training an account risk model includes: generating at least one sample connectivity graph based on the transaction data of multiple sample user accounts, generating a sample weighted adjacency matrix for any sample connectivity graph based on the preset transaction amount and the transaction amount corresponding to each path in the sample connectivity graph, generating a sample training matrix set based on at least one sample weighted adjacency matrix, and training the account risk model using the sample training matrix set to obtain a trained account risk model. In this technical solution: supervised training of the account risk model is performed using the sample training matrix set generated based on the sample weighted adjacency matrix, and the sample weighted adjacency matrix contains the transaction relationship between the sample user accounts, thereby improving the recognition accuracy of the trained account risk model.

Description

Training method of account risk model and determining method of risk user group

Technical Field

The application relates to the technical field of science and technology finance, in particular to a training method of an account risk model and a determining method of a risk user group.

Background

With the continuous and deep development of financial institutions in internet financial business, internet finance provides more convenience for the masses and a money laundering channel for lawbreakers. The money laundering group partner is accurately identified, so that economic crimes can be effectively hit, the harm to society caused by the money laundering group partner is reduced, and the fairness of the society is maintained. Therefore, how to identify the money laundering partners is critical.

At present, the money laundering partner is identified by mainly identifying and analyzing the transaction amount of the user to be identified through a preset money laundering model, so that the money laundering partner is identified from all the users to be identified. The money back-flushing model is usually a decision tree, a random forest, a convolutional neural network, a cyclic neural network, a long-term memory artificial neural network and the like.

However, in the prior art, only the transaction amount of the user to be identified is considered when the money laundering partner is identified based on the money laundering model, which may result in the situation that the normal user is identified as the money laundering partner, and there is a problem of low identification accuracy.

Disclosure of Invention

The application provides a training method of an account risk model and a determining method of a risk user group, which are used for solving the problem that the identification accuracy is low when a normal user is possibly identified as a money laundering party.

In a first aspect, an embodiment of the present application provides a training method for an account risk model, including:

Generating at least one sample connection graph according to transaction data of a plurality of sample user accounts, wherein nodes in the sample connection graph are account information of the sample user accounts, paths in the sample connection graph are transaction relations and transaction amounts between the sample user accounts corresponding to two nodes connected through the paths, the account information comprises sample risk values of the sample user accounts, and the sample risk values are the possibility of abnormal operation of the sample user accounts;

for any sample communication graph, generating a sample weighted adjacency matrix according to a preset transaction amount and a transaction amount corresponding to each path in the sample communication graph;

Generating a sample training matrix set according to at least one sample weighted adjacency matrix;

Training an account risk model by using the sample training matrix set to obtain a trained account risk model, wherein the trained account risk model is used for obtaining a target risk value of each user account to be detected according to transaction data of a plurality of user accounts to be detected and the preset transaction amount.

In one possible design of the first aspect, the generating a sample training matrix set according to at least one sample weighted adjacency matrix includes:

For any sample user account in any sample weighted adjacency matrix, according to the transaction time between the sample user account and other sample user accounts and the sample weighted adjacency matrix, obtaining a sample risk vector of the sample user account, wherein the sample risk vector is used for representing an initial sample risk value of the sample user account;

And generating a sample training matrix set according to each sample weighted adjacency matrix and the sample risk vector of each sample user account.

Optionally, the obtaining, for any sample user account in any sample weighted adjacency matrix, a sample risk vector of the sample user account according to a transaction time between the sample user account and other sample user accounts and the sample weighted adjacency matrix includes:

Aiming at any sample user account in any sample weighted adjacency matrix, generating a first sample risk value according to the transaction frequency between the sample user account and other sample user accounts, the transaction total and the number of accounts transacted with the sample user account;

generating a second sample risk value according to the weight corresponding to the path connecting the sample user account;

Generating a third sample risk value according to the account information of the sample user account and the transaction time between other sample user accounts;

generating a fourth sample risk value according to the transaction relation and the transaction amount between the sample user account and other sample user accounts;

And determining the first sample risk value, the second sample risk value, the third sample risk value and the fourth sample risk value as the sample risk vector.

In another possible design of the first aspect, the generating at least one sample connectivity graph according to transaction data of a plurality of sample user accounts includes:

acquiring a sample knowledge graph according to transaction data of a plurality of sample user accounts;

And deleting isolated points in the sample knowledge graph, and acquiring at least one sample connected graph from the sample knowledge graph.

In yet another possible design of the first aspect, the training the account risk model using the sample training matrix set to obtain a trained account risk model includes:

and training the account risk model according to the sample training matrix set and the multi-head attention mechanism to obtain the trained account risk model.

In a second aspect, an embodiment of the present application provides a method for determining a risk user group, including:

generating at least one target communication graph according to transaction data of a plurality of user accounts to be detected, wherein nodes in the target communication graph are account information of the user accounts to be detected, and paths in the target communication graph are transaction relations and transaction amounts between the user accounts to be detected, which correspond to the two nodes connected by the paths;

Aiming at any target communication graph, generating a target weighted adjacency matrix according to a preset transaction amount and a transaction amount corresponding to each path in the target communication graph;

Inputting at least one target weighted adjacency matrix into a trained account risk model, and acquiring a target risk value of each user account to be detected, wherein the target risk value is the possibility of abnormal operation of the user account to be detected, and the trained account risk model is obtained by training the account risk model by using transaction data of a plurality of sample user accounts and the preset transaction amount;

determining at least one first target user account from the user accounts to be detected according to the target risk value of each user account to be detected;

and calculating the similarity of each first target user account and other user accounts to be detected, and determining a risk user group according to the users corresponding to the user accounts to be detected.

In one possible design of the second aspect, the calculating the similarity between each first target user account and other user accounts to be detected, and determining the risk user group according to the user corresponding to the user account to be detected includes:

Calculating the similarity of each first target user account and other user accounts to be detected, and determining at least one second target user account from the other user accounts to be detected according to the calculated at least one similarity;

and determining the user corresponding to the at least one first target user account and the user corresponding to the at least one second target user account as the risk user group.

Optionally, for any target connectivity graph, generating a target weighted adjacency matrix according to a preset transaction amount and a transaction amount corresponding to each path in the target connectivity graph, including:

generating a weighted adjacency matrix according to the preset transaction amount and the transaction amount corresponding to each path in the target communication diagram aiming at any target communication diagram;

setting the value of a weight element smaller than a preset weight value in each weighted adjacent matrix to 0, and generating a target weighted adjacent matrix;

correspondingly, after the generating the target weighted adjacency matrix, the method further comprises:

Generating a non-zero element vector set according to at least one non-zero element in each target weighted adjacent matrix, wherein the non-zero element vector set comprises a first vector, a second vector and a third vector, the first vector is used for storing all non-zero elements in the target weighted adjacent matrix, the second vector is used for storing row positions of all non-zero elements, and the third vector is used for storing column positions of all non-zero elements;

correspondingly, the inputting the at least one target weighted adjacency matrix into the trained account risk model comprises the following steps:

at least one non-zero element vector set is input into a trained account risk model.

Optionally, after the generating the target weighted adjacency matrix, the method further includes:

Aiming at any user account to be detected in any non-zero element vector set, acquiring a target risk vector of the user account to be detected according to the transaction time between the user account to be detected and other user accounts to be detected and the target weighted adjacency matrix, wherein the target risk vector is used for representing an initial risk value of the user account to be detected;

And inputting at least one non-zero element vector set and a risk vector of each user account to be detected into the trained account risk model.

Optionally, the obtaining, for any user account to be detected in any non-zero element vector set, a target risk vector of the user account to be detected according to the transaction time between the user account to be detected and other user accounts to be detected and the target weighted adjacency matrix includes:

aiming at any user account to be detected in any non-zero element vector set, generating a first initial risk value according to the transaction frequency between the user account to be detected and other user accounts to be detected, the transaction total and the number of accounts transacted with the user accounts to be detected;

Generating a second initial risk value according to the weight corresponding to the path connecting the user account to be detected;

generating a third initial risk value according to the account information of the user account to be detected and the transaction time between other user accounts to be detected;

generating a fourth initial risk value according to the transaction relation and the transaction amount between the user account to be detected and other user accounts to be detected;

And determining the first initial risk value, the second initial risk value, the third initial risk value and the fourth initial risk value as the target risk vector.

Optionally, after the target risk value of each user account to be detected is obtained, the method further includes:

acquiring an updated target risk vector;

Correspondingly, the calculating the similarity of each first target user account and other user accounts to be detected includes:

calculating first similarity of each first target user account and other user accounts to be detected according to the updated target risk vector;

Calculating the second similarity of each first target user account and other user accounts to be detected according to the transaction time between each first target user account and other user accounts to be detected;

Calculating third similarity of each first target user account and other user accounts to be detected according to the transaction amount between each first target user account and other user accounts to be detected;

and generating the similarity of each first target user account and other user accounts to be detected according to the first similarity, the second similarity and the third similarity.

In another possible design of the second aspect, the generating at least one target connectivity graph according to transaction data of a plurality of user accounts to be detected includes:

Constructing a target knowledge graph according to transaction data of a plurality of user accounts to be detected;

deleting isolated points in the target knowledge graph to obtain at least one initial connected graph;

discarding the initial connected graphs with the node number smaller than the preset node number, and determining the initial connected graphs with the node number larger than or equal to the preset node number as target connected graphs.

In a third aspect, an embodiment of the present application provides a training apparatus for an account risk model, including:

The generation module is used for generating at least one sample connection graph according to transaction data of a plurality of sample user accounts, nodes in the sample connection graph are account information of the sample user accounts, paths in the sample connection graph are transaction relations and transaction amounts between the sample user accounts corresponding to the two nodes connected by the paths, the account information comprises sample risk values of the sample user accounts, and the sample risk values are the possibility of abnormal operation of the sample user accounts;

The generation module is further used for generating a sample weighted adjacency matrix according to a preset transaction amount and the transaction amount corresponding to each path in the sample communication graph for any sample communication graph;

The generating module is further used for generating a sample training matrix set according to at least one sample weighted adjacent matrix;

the training module is used for training the account risk model by using the sample training matrix set to obtain a trained account risk model, and the trained account risk model is used for obtaining a target risk value of each user account to be detected according to the transaction data of the user accounts to be detected and the preset transaction amount.

In a fourth aspect, an embodiment of the present application provides a device for determining a risk user group, including:

The generation module is used for generating at least one target communication graph according to transaction data of a plurality of user accounts to be detected, wherein nodes in the target communication graph are account information of the user accounts to be detected, and paths in the target communication graph are transaction relations and transaction amounts between the user accounts to be detected, which correspond to the two nodes connected through the paths;

the generation module is further used for generating a target weighted adjacency matrix according to a preset transaction amount and a transaction amount corresponding to each path in the target communication diagram for any target communication diagram;

The input module is used for inputting at least one target weighted adjacency matrix into a trained account risk model, and obtaining a target risk value of each user account to be detected, wherein the target risk value is the possibility of abnormal operation of the user account to be detected, and the trained account risk model is obtained by training the account risk model by using transaction data of a plurality of sample user accounts and the preset transaction amount;

The determining module is used for determining at least one first target user account from the user accounts to be detected according to the target risk value of each user account to be detected;

The determining module is further configured to calculate similarity between each first target user account and other user accounts to be detected, and determine a risk user group according to users corresponding to the user accounts to be detected.

In a fifth aspect, an embodiment of the application provides an electronic device comprising a processor, a memory and computer program instructions stored on the memory and executable on the processor for implementing the first aspect, the second aspect and the methods provided in each possible design in the first aspect and the second aspect when the computer program instructions are executed by the processor.

In a sixth aspect, embodiments of the present application may provide a computer-readable storage medium having stored therein computer-executable instructions for implementing the methods provided by the first aspect, the second aspect, and each possible design in the first aspect and the second aspect when executed by a processor.

The training method of the account risk model and the determining method of the risk user group provided by the embodiment of the application comprise the steps of generating at least one sample communication diagram according to transaction data of a plurality of sample user accounts, generating a sample weighted adjacent matrix according to a preset transaction amount and a transaction amount corresponding to each path in the sample communication diagram aiming at any sample communication diagram, generating a sample training matrix set according to at least one sample weighted adjacent matrix, training the account risk model by using the sample training matrix set, and obtaining a trained account risk model. The sample training matrix set generated according to the sample weighted adjacency matrix is used for carrying out supervised training on the account risk model, the sample weighted adjacency matrix contains the transaction relation among sample user accounts, the recognition accuracy of the trained account risk model is improved, the false positive rate is reduced when the trained account risk model is used subsequently, and the recognition accuracy of the money laundering and gathering situation is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

Fig. 1 is a schematic diagram of an application scenario of a training method of an account risk model according to an embodiment of the present application;

FIG. 2 is a flowchart of a first embodiment of a training method for an account risk model according to an embodiment of the present application;

fig. 3 is a schematic flow chart of a second embodiment of a training method for an account risk model according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a suspicious transaction embodiment I according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a suspicious transaction embodiment II according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a third embodiment of a suspicious transaction according to embodiments of the present application;

fig. 7 is a schematic flow chart of a first embodiment of a method for determining a risk user group according to an embodiment of the present application;

fig. 8 is a schematic diagram of a process for obtaining a target risk value of a user account to be detected according to an embodiment of the present application;

fig. 9 is a schematic flow chart of a second embodiment of a method for determining a risk user group according to the embodiment of the present application;

Fig. 10 is a schematic flow chart of a third embodiment of a method for determining a risk user group according to the embodiment of the present application;

Fig. 11 is a schematic flow chart of a third embodiment of a method for determining a risk user group according to the embodiment of the present application;

FIG. 12 is a schematic diagram of a training device for an account risk model according to an embodiment of the present application;

Fig. 13 is a schematic structural diagram of a determining device for a risk user group according to an embodiment of the present application;

Fig. 14 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Specific embodiments of the present disclosure have been shown by way of the above drawings and will be described in more detail below. These drawings and the written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the disclosed concepts to those skilled in the art by reference to specific embodiments.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Before describing embodiments of the present application, the terms related to the embodiments of the present application will be explained first:

Money laundering generally refers to the process and process of masking or hiding the nature and source of various criminal earnings, and by means of various transformations, transfer and hiding illegally obtained and earnings generated thereby, making them appear legal. Also included are activities and processes that convert legal results into illegal results, and converting the resulting funds to meet specific illegal uses.

Deep learning refers to the process of learning internal rules and expression levels from sample data by a computer program so as to predict unknown data, and is a structured neural network learning model.

The graphic neural network (Graph Neural Networks, GNN) is a neural network that acts directly on the graphic structure.

The graph attention neural network (Graph Attention Networks, GAT) is one of graph neural networks, and aggregation operation is carried out on neighbor nodes through an attention mechanism, so that the self-adaptive distribution of different neighbor weights is realized, and the expression capacity of a graph neural network model is improved.

Summarizing tasks, refer to training and testing using different graph structures.

Neo4j graph database is an on-line database management system with operations of creating, reading, updating and deleting of processing graph data models. Neo4j graph database is the most widely used of all graph databases.

Similarity algorithm-similarity algorithm refers to a class of computer programs that uses related data to measure the degree of similarity between objects.

Directed graph-graph with directed edges.

Degree in the directed graph, the degree is divided into an outgoing degree and an incoming degree. The degree of egress of a node refers to the number of edges of the node that point to other nodes, and the degree of ingress refers to the number of edges that point to the node.

A connected graph, in which any two vertices can be connected, i.e., from one vertex to another vertex, there is at least one path (i.e., edge). Wherein a vertex may also be referred to as a node.

The sparse matrix is that the elements in the matrix are divided into two sets according to zero and non-zero, and for zero elements, the element number ratio of the zero elements to the whole matrix is called as density, and the density is less than 0.05.

The application has the specific application background that money back-flushing is an important task of a financial institution in the aspect of supervision compliance, and in the money back-flushing field, the transaction amount of a user to be identified is generally identified and analyzed through a preset money back-flushing model, so that whether money back-flushing activity exists in an output account of the money back-flushing model. After the account number with money laundering activity is obtained, the user corresponding to the account number is determined, and other members in the same money laundering partner with the user are determined by manually checking other information such as the running water of the user.

However, in the prior art, only the transaction amount of the user to be identified is considered when the money laundering partner is identified based on the money laundering model, which may result in the situation that the normal user is identified as the money laundering partner, and there is a problem of low identification accuracy. Further, after obtaining whether each account has money laundering activity, other members who are in the same money laundering party as the user need to be manually determined, and the labor cost is high.

The application aims at the problems that due to suspicious transaction relation and transaction structure among members in the money laundering party, if the money laundering party possibly has the characteristics of scattered transfer-in and concentrated transfer-out of funds, concentrated transfer-in and scattered transfer-out, frequent fund receipt and payment among the same payees in a short period, transaction amount approaching to a large transaction standard and the like. Therefore, if the transaction relationship between the user accounts to be detected is considered, at least one target weighted adjacency matrix is obtained according to the transaction data of a plurality of user accounts to be detected, the target risk value of each user account to be detected is obtained based on the trained account risk model, at least one first target user account (i.e. the core member of the money-washing partner) can be determined, and other members in the money-washing partner are determined according to the suspicious transaction relationship and the transaction structure, so that the accuracy of identifying the money-washing partner is improved.

The method for training the account risk model provided by the embodiment of the application can be applied to an application scene schematic diagram shown in fig. 1. Fig. 1 is a schematic diagram of an application scenario of a training method of an account risk model according to an embodiment of the present application, so as to solve the above technical problems. As shown in fig. 1, the application scenario may include a service terminal device and an electronic device, and may further include a graph database connected with the electronic device.

In the embodiment of the application, the service terminal equipment is mainly used for providing services for the user when the user transacts the service and storing transaction data generated when the user transacts the service. The electronic equipment acquires transaction data of a user account stored in the service terminal equipment within a preset time window, and determines the user account as a sample user account. Further, the electronic device imports transaction data of the sample user account into a graph database, and obtains a sample knowledge graph constructed by the graph database according to the transaction data.

Further, the electronic equipment processes the sample knowledge graph to generate a sample training matrix set, and trains the account risk model by using the sample training matrix set, so that a trained account risk model is obtained.

It may be understood that the execution body of the embodiment of the present application may be a terminal device, for example, a computer, a tablet computer, or the like, or may be a server, for example, a background processing platform, or the like. Thus, the present embodiment is explained with the terminal device and the server collectively referred to as an electronic device, which can be determined in actual cases as to whether the electronic device is specifically a terminal device or a server.

The technical scheme of the application is described in detail through specific embodiments.

It should be noted that the following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments.

Fig. 2 is a flowchart of a first embodiment of a training method for an account risk model according to an embodiment of the present application. As shown in fig. 2, the training method of the account risk model may include the following steps:

s21, generating at least one sample connectivity graph according to transaction data of a plurality of sample user accounts.

In the embodiment of the application, the electronic equipment can be connected with at least one service terminal equipment, and after transaction data of a sample user account is acquired from a certain service terminal equipment, the acquired data can be processed to generate at least one sample connectivity graph. The electronic device may also be a device that stores data itself, so that it may acquire transaction data of a sample user account stored by itself, and further analyze the transaction data to generate at least one sample connectivity graph.

The transaction data includes account information of sample user accounts, transaction relation among the sample user accounts and transaction amount (which can be also understood as transaction amount). The account information includes an entity (i.e., a sample user account), an account attribute, and a sample risk value of the sample user account, where the sample risk value is a likelihood that the sample user account has abnormal operation. The account attributes include account age, account balance, other information, and the like. The trade relation includes transfer, presentation and recharging etc.

For example, the abnormal operation may be the presence of money laundering behavior.

The sample user account may be a bank card account of the sample user, or may be an account name of other payment software used by the user, such as a payment treasured account or a WeChat account, or may include other accounts having a transaction function in the prior art, which is not particularly limited in the embodiment of the present application.

The accounts of the sample user accounts can be different accounts of the sample user, and can also be different accounts of the same sample user.

In a specific embodiment, according to transaction data of a plurality of sample user accounts, a sample knowledge graph is obtained, isolated points in the sample knowledge graph are deleted, and at least one sample connectivity graph is obtained from the sample knowledge graph. And importing the transaction data of the plurality of sample user accounts into a graph database, so as to obtain a sample knowledge graph constructed by the graph database according to the transaction data of the plurality of sample user accounts. Further, isolated points in the sample knowledge graph are deleted, abnormal data in the sample knowledge graph can be removed through other existing means, and at least one sample connected graph is obtained from the sample knowledge graph. By deleting the independent points, the nodes which are not money laundering and are not partnering are deleted, the data redundancy of the sample training matrix set is reduced, and the accuracy and the efficiency of subsequent calculation are improved.

The graph database may be, for example, a Neo4j graph database.

The nodes (also called vertices) in the sample connectivity graph are account information of sample user accounts, and the paths in the sample connectivity graph are transaction relations and transaction amounts between sample user accounts corresponding to two nodes connected by the paths.

S22, generating a sample weighted adjacency matrix according to the preset transaction amount and the transaction amount corresponding to each path in the sample communication diagram aiming at any sample communication diagram.

For example, the preset transaction amount may be 10000, 11000, 12000, or the like, and may be set according to experience in practical applications, which is not particularly limited in the embodiment of the present application.

In a specific implementation manner, for any sample communication graph, determining a ratio of a transaction amount corresponding to each path in the sample communication graph to a preset transaction amount as a weight corresponding to the path, and generating a sample weighted adjacency matrix.

For example, assuming that the preset transaction amount is 10000, the transaction data between the node 1 and the node 2 is 100000, that is, the transaction amount corresponding to the path between the node 1 and the node 2 is 100000, the weight corresponding to the path is 10.

S23, generating a sample training matrix set according to at least one sample weighted adjacent matrix.

In one possible implementation, all sample weighted adjacency matrices can be directly aggregated to generate a sample training matrix set.

In another possible implementation, a sample training matrix set may be generated from at least one sample weighted adjacency matrix and a sample risk vector for each sample user account.

Optionally, at least one sample weighted adjacency matrix may be determined as a first sub-sample training matrix set, a sample risk vector of each sample user account is determined as a second sub-sample training matrix set, and the first sub-sample training matrix set and the second sub-sample training matrix set are assembled to generate the sample training matrix set.

Optionally, the sample risk vector of each sample user account may be spliced to a corresponding position in at least one sample weighted adjacency matrix to generate a sample training matrix set.

In this method, the sample risk vector for each sample user account may be determined by:

1) The sample risk vector of each sample user account can be obtained by initializing each sample user account by a random method.

2) And according to the transaction time between the sample user account and other sample user accounts and the sample weighted adjacency matrix, the sample risk vector of the sample user account can be obtained for any sample user account in any sample weighted adjacency matrix. The specific implementation principle of the initial sample risk vector for obtaining each sample user account may be described in the embodiment shown in fig. 3 below, and will not be described herein.

The sample risk vector is used for representing an initial sample risk value of the sample user account. In view of the fact that the account risk model is sensitive to the initial values of the parameters, a sample risk vector of the sample user account is obtained through the transaction time between the sample user account and other sample user accounts and the sample weighted adjacency matrix, and the sample risk value is initialized, so that the initial sample risk value is obtained, and the accuracy of subsequent training of the account risk model is improved.

S24, training the account risk model by using the sample training matrix set to obtain a trained account risk model.

The account risk model may be a model stored in the electronic device in advance, or may be a model obtained by the electronic device from other data storage devices.

Alternatively, the account risk model may be a GAT model, or may be another graph neural network model, which is not specifically limited in the embodiment of the present application.

In a specific embodiment, the account risk model is trained according to the sample training matrix set and the multi-head attention mechanism, and a trained account risk model is obtained.

Illustratively, it is assumed that the account risk model has three hidden layers, namely a first hidden layer, a second hidden layer, and a third hidden layer. The multi-head Attention mechanism is a K-head Attention mechanism, that is, K Attention (English: attention) in total, and the Attention vector generated by each Attention is: Where h _j is the sample risk vector of node j, h' _i is the updated sample risk value of node i, σ is the softmax function, α _ij is the calculated attention coefficient, and N _i is the first-order neighbor node of node i.

Further, the calculated attention coefficient may be obtained by the following formula:

Wherein LeakyRelu is a nonlinear activation function, a is a preset mapping vector, which is used for mapping V _ij[Wh_i||Wh_j to a real number, and W is a linear parameter matrix.

Alternatively, the calculated attention coefficient may also be obtained by the following formula:

Where c ¹ is the vector storing all non-zero elements in the sample weighted adjacency matrix, c ² is the row position storing all non-zero elements, and c ³ is the column position storing all non-zero elements.

For the first hidden layer and the second hidden layer, attention vectors generated by the Attention vectors for K agents, which are used in training the account risk model, are spliced together to obtain the Attention vectors, and the Attention vectors can be obtained by the formula: And (5) obtaining.

For the third hidden layer, the Attention vector used in training the account risk model is obtained by performing an average process on the Attention vectors generated by the K Attention, and can be obtained by the formula: And (5) obtaining.

In the embodiment, training is performed on the account risk model by using a sample training matrix set, and W is continuously optimized, so that a trained account risk model is obtained. By introducing a multi-head attention mechanism, the functions of core members in the money laundering and gathering are amplified, and the accuracy and the operation rate are improved.

The trained account risk model is used for acquiring a target risk value of each user account to be detected according to the transaction amounts of the user accounts to be detected and the preset transaction amounts.

Optionally, dropout is not applied to the attention coefficient when training the account risk model. Dropout refers to the temporary discarding of neural network elements from the network with a certain probability during training of the deep learning network.

According to the training method for the account risk model, at least one sample connection diagram is generated according to transaction data of a plurality of sample user accounts, for any sample connection diagram, a sample weighted adjacent matrix is generated according to a preset transaction amount and a transaction amount corresponding to each path in the sample connection diagram, a sample training matrix set is generated according to at least one sample weighted adjacent matrix, and the sample training matrix set is used for training the account risk model to obtain a trained account risk model. The sample training matrix set generated according to the sample weighted adjacency matrix is used for carrying out supervised training on the account risk model, the sample weighted adjacency matrix contains the transaction relation among sample user accounts, and the recognition accuracy of the trained account risk model is improved, so that the false positive rate is reduced when the trained account risk model is used subsequently.

Compared with other deep learning methods, the GAT algorithm does not have a same kernel for all nodes, and can distribute different attentions according to weights among different nodes. This is helpful to improve the accuracy of identification of the money laundering partner, because in the money laundering case, the core member of the money laundering partner is in the center of the transaction and cannot be considered as a common node. In order to make the trained account risk model more fit with the characteristics of money laundering activities, a sample weighted adjacency matrix is added when the attention coefficient is calculated, so that the convergence speed of the training process is increased. Other deep learning methods, such as convolutional neural network (Convolutional Neural Networks, CNN), recurrent neural network (Recurrent Neural Network, RNN), long Short-Term Memory (LSTM), etc., cannot process the graph data, so that the trade relationship between users cannot be considered when identifying money laundering partners, and the accuracy is poor.

Furthermore, the diversity of money laundering activities makes the graph structure of data variable, belonging to the inductive task. If other deep learning methods are used, only a certain class of money laundering activities can be aimed, and the application range is limited. Once the model is retrained for other types of money laundering activities, this adds significant cost to the model, and the prior art either fails to scale out because of technical nature, or because of the model cost considerations, making it impractical to take into account the real transaction data volume, resulting in practical inefficiencies. The characteristics of the GAT model can well complete the induction task, so that the application range of the trained account risk model is improved, the risk degrees of all types of money laundering activities can be identified, and the labor cost is low.

Fig. 3 is a schematic flow chart of a second embodiment of a training method for an account risk model according to an embodiment of the present application. As shown in fig. 3, based on any of the above embodiments, for any sample user account in any sample weighted adjacency matrix, according to the transaction time between the sample user account and other sample user accounts and the sample weighted adjacency matrix, the sample risk vector of the sample user account may be obtained by:

S31, aiming at any sample user account in any sample weighted adjacency matrix, generating a first sample risk value according to the transaction frequency between the sample user account and other sample user accounts, the transaction total and the number of accounts transacted with the sample user account.

In the embodiment of the application, the actual money laundering activity scene is combined, and the money laundering activity comprises the following suspicious transaction structures according to the characteristics of suspicious transaction cases:

1) Exemplary, fig. 4 is a schematic structural diagram of a suspicious transaction embodiment one provided by an embodiment of the present application. As shown in fig. 4, there are 5 sample user accounts, namely account 1, account 2, account 3, account 4 and account 5, and account 1 transfers funds into account 2, account 3, account 4 and account 5, respectively, i.e. there is a transaction structure in which funds are transferred out in a concentrated manner.

It will be appreciated that under this architecture, there is often a transaction architecture accompanied by a scatter transfer of funds.

2) Exemplary, fig. 5 is a schematic structural diagram of a suspicious transaction embodiment two according to an embodiment of the present application. As shown in fig. 5, a total of 5 sample user accounts, namely account 6, account 7, account 8, account 9 and account 10, respectively, and account 7, account 8, account 9 and account 10 respectively transfer funds into account 6, namely a transaction structure with funds transferred in a concentrated manner.

3) Funds are frequently paid between the same payees in the short term and the transaction amount approaches the high-priced transaction standard.

Fig. 6 is a schematic structural diagram of a third suspicious transaction embodiment according to an embodiment of the present application. As shown in fig. 6, there are 2 sample user accounts, account a and account B, respectively, where a total of 5 transactions occur. Wherein account A transfers 2 times to account B and account B transfers 3 times to account A.

4) The account which is idle for a long time is suddenly started for unknown reasons or the account with small fund flow suddenly has abnormal fund inflow at ordinary times, and a large amount of fund is paid in a short time.

Therefore, the risk vectors of all sample user accounts in the sample weighted adjacency matrix can be obtained according to the suspicious transaction structure.

In a specific implementation manner, a preset transaction frequency, a preset transaction total amount and a preset account number are preset. And determining the transaction frequency between the sample user account and other sample user accounts according to any sample user account in any sample weighted adjacency matrix, wherein the transaction total sum and the number of the accounts transacted with the sample user account exceed the preset transaction frequency, the number of the preset transaction total sum and the number of the preset accounts is preset, and the number is determined to be a first sample risk value.

For example, for the sample user account a, assume that the preset transaction frequency is 10 times/day, the preset transaction total is 10000 yuan, and the number of preset accounts is 10. The transaction frequency between the sample user account A and other sample user accounts is 15 times/day, the total transaction amount is 1000000 yuan, and the transaction is carried out with 5 sample user accounts. Since the transaction frequency is greater than the preset transaction frequency, the transaction total is greater than the preset transaction total, and the number of accounts is less than the number of preset accounts, namely the transaction frequency, the transaction total and the number of accounts exceed the preset transaction frequency, the number of the preset transaction total and the number of the preset accounts is 2 (the transaction frequency and the transaction total), and the first sample risk value is 2.

Wherein the first sample risk value may be represented by risk ₁.

The preset transaction frequency, the preset transaction total amount and the preset account number can be set according to experience, and the embodiment of the application does not limit the preset transaction frequency.

S32, generating a second sample risk value according to the weight corresponding to the path for connecting the sample user account.

In a specific implementation manner, weights corresponding to paths connecting the sample user accounts may be summed, and the sum of the weights obtained by the summation is used as the second sample risk value.

Further, the second sample risk value may be represented by the formula: and (5) obtaining. Wherein S is a weight sum and risk ₂ is a second sample risk value.

S33, generating a third sample risk value according to the account information of the sample user account and the transaction time between other sample user accounts.

In a specific implementation manner, judging whether the total transaction amount of the sample user account is larger than a preset transaction total amount, determining a third sample risk value as 0 when the total transaction amount is not exceeded, and calculating the total time difference T from the opening of the account to the first transaction of the sample user account when the total transaction amount is exceeded, wherein the total time difference T is represented by the formula: and acquiring a third sample risk value, wherein risk ₃ is the third sample risk value.

S34, generating a fourth sample risk value according to the transaction relation and the transaction amount between the sample user account and other sample user accounts.

In one possible way, consider that the closer the ratio of the amount of the transfer-in to 1, the greater the risk of money laundering, by the formula: And acquiring a fourth sample risk value, wherein P is the amount ratio of the sample user account to the transfer-in amount, and risk ₄ is the fourth sample risk value.

S35, determining the first sample risk value, the second sample risk value, the third sample risk value and the fourth sample risk value as sample risk vectors.

In one implementation, the formula may be: A sample risk vector is determined. Wherein, As a sample risk vector, risk is the initial sample risk value for the sample user account.

In the above embodiment, since the account risk model using the GAT algorithm is sensitive to the initial values of the parameters, that is, different initial values may cause different accuracy rates, the sample risk vector is determined according to the characteristics of the suspicious transaction structure, so as to complete the initialization processing of the sample risk value, and obtain the initial sample risk value, so that the trained account risk model becomes more interpretable.

After the trained account risk model is obtained, the trained account risk model can be used to obtain a target risk value of each user account to be detected, so that a risk user group is determined. The process of using the trained account risk model to determine a population of risk users is described in detail below in connection with specific embodiments. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.

In specific implementation, the execution subject of the method for determining the risk user group is an electronic device. It should be understood that the electronic device that performs the method for determining the risk user group and the electronic device that performs the method for training the account risk model may be the same device or may be different devices.

Fig. 7 is a flowchart of an embodiment one of a method for determining a risk user group according to an embodiment of the present application. As shown in fig. 7, the method for determining the risk user group may include the following steps:

S71, generating at least one target connectivity graph according to transaction data of a plurality of user accounts to be detected.

In the embodiment of the application, the electronic equipment can be connected with at least one service terminal equipment so as to acquire the transaction data of a plurality of user accounts to be detected in a certain service terminal equipment in real time, and process the acquired transaction data, thereby generating at least one target communication graph. The electronic device may also be a device that stores data itself, so that it may acquire transaction data of multiple user accounts to be detected stored by itself, and further analyze the transaction data to generate at least one target connectivity graph.

The transaction data includes account information of user accounts to be detected, transaction relation among the user accounts to be detected, and transaction amount (which can be understood as transaction amount). The account information includes the entity (i.e., the user account to be detected) and the account attributes. The account attributes include account age, account balance, other information, and the like. The trade relation includes transfer, presentation and recharging etc.

The user account to be detected may be a bank card account of the user to be detected, or an account name of other payment software used by the user, such as a payment treasured account or a WeChat account, or may include other accounts used for representing the identity of the user in the prior art, which is not particularly limited in the embodiment of the present application.

The account numbers of the user accounts to be detected can be different account numbers of the user to be detected, and can also be different account numbers of the same user to be detected.

The nodes in the target connectivity graph are account information of user accounts to be detected, and the paths in the target connectivity graph are transaction relations and transaction amounts between the user accounts to be detected, which correspond to the two nodes connected by the paths;

In one embodiment, a target knowledge graph is constructed according to transaction data of a plurality of user accounts to be detected. And deleting isolated points in the target knowledge graph to obtain at least one initial connected graph. And finally, discarding the initial connected graph with the node number smaller than the preset node number, and determining the initial connected graph with the node number larger than or equal to the preset node number as a target connected graph. By deleting the independent points, the nodes which are not money laundering and partner are deleted, and the data redundancy of the target weighted adjacent matrix is reduced, so that the accuracy and the efficiency of subsequent calculation are improved.

For example, assuming that the number of preset nodes is 50, the electronic device imports transaction data of a plurality of user accounts to be detected into a graph database, so as to obtain a target knowledge graph constructed by the graph database according to the transaction data of the plurality of user accounts to be detected. Further, the electronic device deletes the isolated point in the target knowledge graph, and may further remove the abnormal data in the target knowledge graph by other existing means, and obtain at least one initial connected graph from the target knowledge graph. Finally, the electronic device determines the initial connected graph with the node number larger than 50 in all the initial connected graphs as a target connected graph.

The graph database may be, for example, a Neo4j graph database.

S72, generating a target weighted adjacency matrix according to a preset transaction amount and the transaction amount corresponding to each path in the target communication diagram aiming at any target communication diagram.

For specific implementation of this step, reference may be made to the description of generating the sample weighted adjacency matrix according to the sample connectivity graph in S22, and its implementation principle and technical effect are similar, and will not be described herein again.

S73, inputting at least one target weighted adjacency matrix into the trained account risk model, and obtaining a target risk value of each user account to be detected.

The target risk value is the possibility of abnormal operation of the user account to be detected, and the trained account risk model is obtained by training the account risk model by using transaction data of a plurality of sample user accounts and preset transaction amounts.

Fig. 8 is a schematic diagram of a process for obtaining a target risk value of a user account to be detected according to an embodiment of the present application. As shown in fig. 8, the trained account risk model has three hidden layers (the first hidden layer, the second hidden layer and the third hidden layer in fig. 8), the target weighted adjacency matrix is input into the trained account risk model, and the three hidden layers in the trained account risk model sequentially process the target weighted adjacency matrix, so as to obtain the target risk value of each user account to be detected in the target weighted adjacency matrix.

Alternatively, the target risk value may be a likelihood that the user to be detected is a money laundering party.

S74, determining at least one first target user account from the user accounts to be detected according to the target risk value of each user account to be detected.

In a specific implementation manner, the number of preset users may be preset, all user accounts to be detected are arranged according to the order of the target risk values from large to small, and the user account to be detected with the number of the preset users arranged in front is determined to be the first target user account.

For example, assuming that the number of preset users is 10, all user accounts to be detected are arranged according to the order of the target risk values from large to small, and the users arranged in the first 10 are determined to be the first target user account.

And S75, calculating the similarity of each first target user account and other user accounts to be detected, and determining a risk user group according to the users corresponding to the user accounts to be detected.

It should be appreciated that in the context of money laundering activities, the first target user account may be the core member of the detected money laundering partner, while other members of the same money laundering partner must have direct or indirect trade connections with the core member, so that other members of the same money laundering partner with the core member may be determined based on similarities between nodes.

In a specific implementation manner, the similarity of each first target user account and other user accounts to be detected is calculated, at least one second target user account is determined from the other user accounts to be detected according to the calculated at least one similarity, and the user corresponding to the at least one first target user account and the user corresponding to the at least one second target user account are determined to be a risk user group.

For expansion, the electronic equipment is preset with preset similarity, the similarity of each first target user account and the neighbor nodes is calculated, and the neighbor nodes with the similarity larger than the preset similarity are selected as the first nodes. And then, further calculating the similarity between the first node and the neighbor node, and selecting the neighbor node with the similarity larger than the preset similarity as a second node. And then, calculating the similarity between the second node and the neighbor node, and selecting the neighbor node with the similarity larger than the preset similarity as a third node. And repeating the steps until the similarity between the N node and the neighbor node is smaller than the preset similarity, and determining the 1 st node to the N node as a second target user account. Wherein N is a positive integer greater than or equal to 2. In view of the fact that other members in the same money-washing partner must have direct or indirect transaction connection with the core member, other members in the same money-washing partner as the core member can be more accurately and quickly identified according to the similarity between the other members and the core member.

The similarity between two nodes can be calculated according to the following formula:

Wherein x ₁ is node 1, x ₂ is node 2, x _1Q is the Q element of node 1, x _2Q is the Q element of node 2, Is the average value of the elements in node 1,Is the average of the elements in node 2.

According to the method for determining the risk user group, at least one target connection diagram is generated according to transaction data of a plurality of user accounts to be detected, a target weighted adjacency matrix is generated according to a preset transaction amount and a transaction amount corresponding to each path in the target connection diagram for any target connection diagram, at least one target weighted adjacency matrix is input into a trained account risk model, a target risk value of each user account to be detected is obtained, at least one first target user account is determined from the user accounts to be detected according to the target risk value of each user account to be detected, similarity of each first target user account to other user accounts to be detected is calculated, and the risk user group is determined according to users corresponding to the user accounts to be detected. The risk detection is carried out on a plurality of user accounts to be detected by using the trained account risk model, the target risk value of each user account to be detected, which is output by the trained account risk model, is obtained, all members in the money laundering partner are further determined by the similarity of the first target user account and other user accounts to be detected, manual intervention is not needed, the accuracy of the determined money laundering partner is improved, the false positive rate is reduced, and the labor cost is saved.

Fig. 9 is a schematic flow chart of a second embodiment of a method for determining a risk user group according to an embodiment of the present application. As shown in fig. 9, based on any of the above embodiments, S72 may be implemented by:

S91, generating a weighted adjacency matrix according to a preset transaction amount and the transaction amount corresponding to each path in the target communication diagram aiming at any target communication diagram.

S92, setting the value of the weight element smaller than the preset weight value in each weighted adjacent matrix to be 0, and generating a target weighted adjacent matrix.

The preset weight value may be preset according to an empirical value, for example, 1, 0.9, 0.8, etc., which is not particularly limited in the embodiment of the present application.

Correspondingly, after the target weighted adjacency matrix is generated, the method for determining the risk user group further comprises the following steps:

A set of non-zero element vectors is generated from at least one non-zero element in each target weighted adjacency matrix, the set of non-zero element vectors comprising a first vector (c ¹), a second vector (c ²) for storing all non-zero elements in the target weighted adjacency matrix, and a third vector (c ³), the second vector for storing row positions of all non-zero elements, and the third vector for storing column positions of all non-zero elements.

Alternatively, the non-zero element vector set may be determined by a coordinate storage algorithm (coordinate storage, coo) in a sparse matrix algorithm.

Correspondingly, inputting at least one target weighted adjacency matrix into the trained account risk model, comprising:

In the embodiment, the target weighted adjacency matrix is processed into at least one non-zero element vector set, so that the memory overhead is saved, and the operation speed is increased.

Optionally, the well trained account risk model in the embodiment may be a GAT model, and only the neighboring node of each node needs to be known during calculation, and for a large-scale target connected graph, the calculation efficiency is improved by parallel calculation of the node-neighboring node.

Optionally, in some embodiments, after generating the target weighted adjacency matrix, the method for determining the risk user group may further include the steps of:

and aiming at any user account to be detected in any non-zero element vector set, acquiring a target risk vector of the user account to be detected according to the transaction time between the user account to be detected and other user accounts to be detected and the target weighted adjacency matrix.

The target risk vector is used for representing an initial risk value of the user account to be detected. And by determining the target risk vector, initializing the risk value to obtain an initial risk value, and reducing the sensitivity of the trained account risk model to the initial value of the parameter.

Wherein the target risk vector may be determined from the first initial risk value, the second initial risk value, the third initial risk value, and the fourth initial risk value. The first initial risk value, the second initial risk value, the third initial risk value and the fourth initial risk value are determined as follows:

1) And generating a first initial risk value according to the transaction frequency between the user account to be detected and other user accounts to be detected, the transaction total and the number of the accounts transacted with the user account to be detected aiming at any user account to be detected in any non-zero element vector set.

2) And generating a second initial risk value according to the weight corresponding to the path connecting the user account to be detected.

3) And generating a third initial risk value according to the account information of the user account to be detected and the transaction time between other user accounts to be detected.

4) And generating a fourth initial risk value according to the transaction relation and the transaction amount between the user account to be detected and other user accounts to be detected.

The specific determination manners of the first initial risk value, the second initial risk value, the third initial risk value and the fourth initial risk value may be referred to the descriptions in S31, S32, S33 and S34, and the implementation principle and technical effect are similar and will not be described herein. According to the characteristics of the suspicious transaction structure, the target risk vector is determined, so that the trained account risk model becomes more interpretable, and the accuracy of subsequent calculation is further improved.

Accordingly, in this embodiment, inputting at least one target weighted adjacency matrix into the trained account risk model may be accomplished by:

And inputting the at least one non-zero element vector set and the risk vector of each user account to be detected into a trained account risk model.

Fig. 10 is a flowchart of a third embodiment of a method for determining a risk user group according to the embodiment of the present application. As shown in fig. 10, after obtaining the target risk value of each user account to be detected based on any of the above embodiments, the method for determining the risk user group further includes the following steps:

And acquiring an updated target risk vector.

Correspondingly, calculating the similarity of each first target user account and other user accounts to be detected can be achieved through the following steps:

s101, calculating first similarity of each first target user account and other user accounts to be detected according to the updated target risk vector.

Wherein, can be obtained by the formula sim ₁＝sim(risk_i,risk_j). sim ₁ is the first similarity, risk _i is the updated target risk vector for node i, and risk _j is the updated target risk vector for node j.

S102, calculating second similarity of each first target user account and other user accounts to be detected according to the transaction time between each first target user account and other user accounts to be detected.

In the application scenario of money laundering activities, the transaction time is an important dimension, and different types of money laundering partners also differ in the selection of the transaction time. For example, one type of money-washing partner typically chooses a mid-night trade, while another type of money-washing partner chooses a trade time that is typically relatively average.

Wherein, can be through the formula: a second similarity is obtained. sim ₂ is a second degree of similarity, Mapping the transaction number between the node i of each time period and other user accounts to be detected in one day by taking one hour as a time period into 24-dimensional vectors; In order to take one hour as a time period, the transaction number between the node j of each time period and other user accounts to be detected in one day is mapped into a 24-dimensional vector, D is a transaction time interval, and assuming that S71 extracts transaction data of a plurality of user accounts to be detected from data in 7 days, then D is 7.

And S103, calculating the third similarity of each first target user account and other user accounts to be detected according to the transaction amount between each first target user account and other user accounts to be detected.

Wherein, can be through the formula: And obtaining a third similarity. sim ₃ is a second degree of similarity, The method comprises the steps that a 24-dimensional vector is mapped to the transaction total amount of a node i and other user accounts to be detected of each time period in one day by taking one hour as a time period; to take one hour as a time period, the node j of each time period in one day is mapped into a 24-dimensional vector according to the total transaction amount of other user accounts to be detected.

S104, generating the similarity of each first target user account and other user accounts to be detected according to the first similarity, the second similarity and the third similarity.

Wherein, can be through the formula: and obtaining the similarity of the first target user account and other user accounts to be detected.

Further, a fourth similarity of each first target user account to other user accounts to be detected may also be calculated by the formula sim ₄＝sim(other_i,other_j. Wherein, the other _i is a multidimensional vector mapped by the account information of the node i, the other _j is a multidimensional vector mapped by the account information of the node j, and the account information includes account age (account participating in money laundering is often biased to be newly opened and is logged out after money laundering behavior), account balance difference in a week, region where the account is located, purchase level, purchase frequency and the like.

Accordingly, after the fourth similarity is obtained, the following formula is also used: and obtaining the similarity of the first target user account and other user accounts to be detected.

In the embodiment, the similarity between the other user accounts to be detected and the core money laundering member is considered from the multi-dimensional angles such as risk similarity, transaction number similarity, transaction amount similarity and the like, so that the partner is found more accurately, and the false positive rate is greatly reduced.

Fig. 11 is a flowchart of a third embodiment of a method for determining a risk user group according to an embodiment of the present application. As shown in fig. 11, the method for determining the risk user group includes:

S111, constructing a target knowledge graph according to transaction data of a plurality of user accounts to be detected.

S112, deleting isolated points in the target knowledge graph to obtain at least one target connected graph.

S113, generating a target weighted adjacency matrix according to a preset transaction amount and a transaction amount corresponding to each path in the target communication diagram aiming at any target communication diagram.

S114, inputting at least one target weighted adjacency matrix into the trained account risk model, and obtaining a target risk value of each user account to be detected.

S115, determining a risk user group from users corresponding to the user accounts to be detected according to the target risk value of each user account to be detected.

S116, determining the transaction data of the user account to be detected as the transaction data of the sample user account of the next period.

In the embodiment, the method of combining application and training is adopted to promote iterative updating of the account risk model, so that overfitting caused by overlarge sample training matrix set is avoided, and accuracy and generalization of the account risk model are improved.

For example, the target risk value of each user account to be detected may be obtained in the month by using the account risk model trained in the previous month. And the transaction data of the user account to be detected in the month are used as the transaction data of the sample user account in the next month in parallel.

Meanwhile, transaction data of a plurality of sample user accounts can be extracted from data of the last three months, so that an account risk model is trained.

The following are examples of the apparatus of the present application that may be used to perform the method embodiments of the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the method of the present application.

Fig. 12 is a schematic structural diagram of an account risk model training device according to an embodiment of the present application. As shown in fig. 12, the training apparatus for an account risk model includes:

The generating module 121 is configured to generate at least one sample connectivity graph according to transaction data of a plurality of sample user accounts, where nodes in the sample connectivity graph are account information of the sample user accounts, paths in the sample connectivity graph are transaction relationships and transaction amounts between sample user accounts corresponding to two nodes connected by a path, the account information includes sample risk values of the sample user accounts, and the sample risk values are likelihood that abnormal operations exist in the sample user accounts;

the generating module 121 is further configured to generate, for any sample connectivity graph, a sample weighted adjacency matrix according to a preset transaction amount and a transaction amount corresponding to each path in the sample connectivity graph;

the generating module 121 is further configured to generate a sample training matrix set according to at least one sample weighted adjacency matrix;

The training module 122 is configured to train the account risk model by using the sample training matrix set to obtain a trained account risk model, where the trained account risk model is configured to obtain a target risk value of each user account to be detected according to the transaction data and the preset transaction amounts of the plurality of user accounts to be detected.

In one possible design of the embodiment of the present application, the generating module 121 is specifically configured to:

Optionally, the generating module 121 is specifically configured to:

Generating a third sample risk value according to account information of the sample user account and transaction time between other sample user accounts;

the first sample risk value, the second sample risk value, the third sample risk value, and the fourth sample risk value are determined as sample risk vectors.

In another possible design of the embodiment of the present application, the generating module 121 is specifically configured to:

In yet another possible design of the embodiment of the present application, the training module 122 is specifically configured to:

And training the account risk model according to the sample training matrix set and the multi-head attention mechanism to obtain a trained account risk model.

The training device for the account risk model provided by the embodiment of the application can be used for executing the training method for the account risk model in any embodiment, and the implementation principle and the technical effect are similar and are not repeated here.

Fig. 13 is a schematic structural diagram of a determining device for a risk user group according to an embodiment of the present application. As shown in fig. 13, the determining device of the risk user group includes:

The generating module 131 is configured to generate at least one target connectivity graph according to transaction data of a plurality of user accounts to be detected, where nodes in the target connectivity graph are account information of the user accounts to be detected, and paths in the target connectivity graph are transaction relationships and transaction amounts between the user accounts to be detected corresponding to two nodes connected by a path;

The generating module 131 is further configured to generate, for any target connectivity graph, a target weighted adjacency matrix according to a preset transaction amount and a transaction amount corresponding to each path in the target connectivity graph;

The input module 132 is configured to input at least one target weighted adjacency matrix into a trained account risk model, obtain a target risk value of each user account to be detected, where the target risk value is a possibility that the user account to be detected has abnormal operation, and the trained account risk model is obtained by training the account risk model using transaction data of a plurality of sample user accounts and preset transaction amounts;

the determining module 133 is configured to determine at least one first target user account from the user accounts to be detected according to the target risk value of each user account to be detected;

The determining module 133 is further configured to calculate a similarity between each first target user account and other user accounts to be detected, and determine a risk user group according to the user corresponding to the user account to be detected.

In one possible design of the embodiment of the present application, the determining module 133 is specifically configured to:

and determining the user corresponding to the at least one first target user account and the user corresponding to the at least one second target user account as a risk user group.

Optionally, the generating module 131 is specifically configured to:

Aiming at any target communication graph, generating a weighted adjacency matrix according to a preset transaction amount and a transaction amount corresponding to each path in the target communication graph;

Accordingly, after generating the target weighted adjacency matrix, the generating module 131 is further configured to:

Generating a non-zero element vector set according to at least one non-zero element in each target weighted adjacency matrix, wherein the non-zero element vector set comprises a first vector, a second vector and a third vector, the first vector is used for storing all non-zero elements in the target weighted adjacency matrix, the second vector is used for storing row positions of all non-zero elements, and the third vector is used for storing column positions of all non-zero elements;

Accordingly, the input module 132 is specifically configured to:

Optionally, after generating the target weighted adjacency matrix, the generating module 131 is further configured to:

aiming at any user account to be detected in any non-zero element vector set, acquiring a target risk vector of the user account to be detected according to the transaction time between the user account to be detected and other user accounts to be detected and a target weighted adjacency matrix, wherein the target risk vector is used for representing an initial risk value of the user account to be detected;

Accordingly, the input module 132 is specifically configured to:

Optionally, the generating module 131 is specifically configured to:

aiming at any user account to be detected in any non-zero element vector set, generating a first initial risk value according to the transaction frequency between the user account to be detected and other user accounts to be detected, the transaction total and the number of the accounts transacted with the user accounts to be detected;

Generating a second initial risk value according to the weight corresponding to the path for connecting the user account to be detected;

generating a third initial risk value according to account information of the user account to be detected and transaction time between other user accounts to be detected;

and determining the first initial risk value, the second initial risk value, the third initial risk value and the fourth initial risk value as target risk vectors.

Optionally, the acquiring module is further configured to acquire a target risk value of each user account to be detected:

acquiring an updated target risk vector;

accordingly, the determining module 133 is specifically configured to:

according to the updated target risk vector, calculating first similarity of each first target user account and other user accounts to be detected;

In another possible design of the embodiment of the present application, the generating module 131 is configured to:

The determining device for the risk user group provided by the embodiment of the application can be used for executing the determining method for the risk user group in any embodiment, and the implementation principle and the technical effect are similar and are not repeated here.

It should be noted that, it should be understood that the division of the modules of the above apparatus is merely a division of a logic function, and may be fully or partially integrated into a physical entity or may be physically separated. The modules can be realized in the form of software which is called by the processing element, in the form of hardware, in the form of software which is called by the processing element, and in the form of hardware. In addition, all or part of the modules may be integrated together or may be implemented independently. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in a software form.

Fig. 14 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 14, the electronic device may include a processor 141, a memory 142, and computer program instructions stored in the memory 142 and executable on the processor 141, where the processor 141 implements the training method of the account risk model or the determining method of the risk user group provided in any of the foregoing embodiments when executing the computer program instructions.

Alternatively, the above devices of the electronic apparatus may be connected by a system bus.

The memory 142 may be a separate memory unit or may be a memory unit integrated into the processor. The number of processors is one or more.

Optionally, the electronic device may also include interfaces to interact with other devices.

It is to be appreciated that the Processor 141 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution.

The system bus may be a peripheral component interconnect (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The system bus may be classified into an address bus, a data bus, a control bus, and the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus. The memory may include random access memory (random access memory, RAM) and may also include non-volatile memory (NVM), such as at least one disk memory.

All or part of the steps for implementing the method embodiments described above may be performed by hardware associated with program instructions. The foregoing program may be stored in a readable memory. The program, when executed, performs the steps comprising the method embodiments described above, and the aforementioned memory (storage medium) comprises read-only memory (ROM), RAM, flash memory, hard disk, solid state disk, magnetic tape (english: MAGNETIC TAPE), floppy disk (english: floppy disk), optical disk (english: optical disk), and any combination thereof.

The electronic device provided by the embodiment of the application can be used for executing the training method of the account risk model or the determining method of the risk user group provided by any of the method embodiments, and the implementation principle and the technical effect are similar, and are not repeated here.

The embodiment of the application provides a computer readable storage medium, wherein computer instructions are stored in the computer readable storage medium, and when the computer instructions run on a computer, the computer is caused to execute the training method of the account risk model or the determining method of the risk user group.

The computer readable storage medium described above may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as static random access memory, electrically erasable programmable read-only memory, magnetic memory, flash memory, magnetic disk or optical disk. A readable storage medium can be any available medium that can be accessed by a general purpose or special purpose computer.

In the alternative, a readable storage medium is coupled to the processor such that the processor can read information from, and write information to, the readable storage medium. In the alternative, the readable storage medium may be integral to the processor. The processor and the readable storage medium may reside in an Application SPECIFIC INTEGRATED Circuits (ASIC). The processor and the readable storage medium may reside as discrete components in a device.

Embodiments of the present application also provide a computer program product, where the computer program product includes a computer program, where the computer program is stored in a computer readable storage medium, and where at least one processor may read the computer program from the computer readable storage medium, where the at least one processor may implement the method for training an account risk model or the method for determining a risk user group when executing the computer program.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for training an account risk model, comprising:

Generating at least one sample communication graph according to transaction data of a plurality of sample user accounts, wherein nodes in the sample communication graph are account information of the sample user accounts, and paths in the sample communication graph are transaction relations and transaction amounts between the sample user accounts corresponding to two nodes connected by the paths;

Generating a first sample risk value according to the transaction frequency between the sample user account and other sample user accounts, the total transaction amount and the number of accounts transacted with the sample user account, generating a second sample risk value according to the weight corresponding to the path connecting the sample user accounts, generating a third sample risk value according to the account information of the sample user account and the transaction time between the other sample user accounts, generating a fourth sample risk value according to the transaction relation and the transaction amount between the sample user account and the other sample user accounts, determining the first sample risk value, the second sample risk value, the third sample risk value and the fourth sample risk value as sample risk vectors, wherein the sample risk vectors are used for representing initial sample risk values of the sample user accounts;

Training an account risk model by using the sample training matrix set to obtain a trained account risk model, wherein the trained account risk model is used for obtaining a target risk value of each user account to be detected according to transaction data of a plurality of user accounts to be detected and preset transaction amounts, and the account risk model comprises a GAT model or a graph neural network model.

2. The method of claim 1, wherein generating at least one sample connectivity graph from transaction data for a plurality of sample user accounts comprises:

3. The method according to claim 1 or 2, wherein training the account risk model using the sample training matrix set to obtain a trained account risk model comprises:

4. A method for determining a population of risk users, comprising:

Generating a weighted adjacency matrix according to a preset transaction amount and a transaction amount corresponding to each path in the target communication graph for any target communication graph, setting the value of a weight element smaller than a preset weight value in each weighted adjacency matrix to be 0, and generating a target weighted adjacency matrix;

Inputting at least one target weighted adjacency matrix into a trained account risk model, and obtaining a target risk value of each user account to be detected, wherein the trained account risk model is obtained by training the account risk model by using transaction data of a plurality of sample user accounts and the preset transaction amount, and the account risk model comprises a GAT model or a graphic neural network model;

5. The method of claim 4, wherein the calculating the similarity between each first target user account and other user accounts to be detected, and determining the risk user group according to the user corresponding to the user account to be detected, includes:

6. The method of claim 4, wherein the step of determining the position of the first electrode is performed,

7. The method of claim 6, wherein after the generating the target weighted adjacency matrix, the method further comprises:

Generating a first initial risk value according to the transaction frequency between the user account to be detected and other user accounts to be detected and the total transaction amount and the number of accounts transacted with the user account to be detected, generating a second initial risk value according to the weight corresponding to the path connecting the user account to be detected, generating a third initial risk value according to the account information of the user account to be detected and the transaction time between the user account to be detected and other user accounts to be detected, generating a fourth initial risk value according to the transaction relation and the transaction amount between the user account to be detected and other user accounts to be detected, determining the first initial risk value, the third initial risk value and the fourth initial risk value as target risk vectors, wherein the target risk vectors are used for representing the initial risk values of the user account to be detected;

8. The method of claim 7, wherein after the obtaining the target risk value for each user account to be detected, the method further comprises:

acquiring an updated target risk vector;

9. The method according to any one of claims 4-8, wherein generating at least one target connectivity graph from transaction data of a plurality of user accounts to be detected comprises:

10. An account risk model training device, comprising:

The generation module is further used for generating a first sample risk value according to the transaction frequency between the sample user account and other sample user accounts, the transaction total sum and the number of accounts transacted with the sample user account, generating a second sample risk value according to the weight corresponding to a path connecting the sample user accounts, generating a fourth sample risk value according to the transaction relation between the sample user account and other sample user accounts and the transaction amount, determining the first sample risk value as a sample risk vector, wherein the sample risk vector is used for representing the initial sample risk value of the sample user account;

The training module is used for training the account risk model by using the sample training matrix set to obtain a trained account risk model, wherein the trained account risk model is used for obtaining a target risk value of each user account to be detected according to transaction data of a plurality of user accounts to be detected and preset transaction amounts, and the account risk model comprises a GAT model or a graphic neural network model.

11. A device for determining a population of risk users, comprising:

The generation module is also used for generating a weighted adjacency matrix according to the preset transaction amount and the transaction amount corresponding to each path in the target communication graph for any target communication graph, setting the value of a weight element smaller than the preset weight value in each weighted adjacency matrix to be 0, and generating a target weighted adjacency matrix;

The system comprises an input module, a training module and a calculation module, wherein the input module is used for inputting at least one target weighted adjacency matrix into a trained account risk model, obtaining a target risk value of each user account to be detected, wherein the target risk value is the possibility of abnormal operation of the user account to be detected, and the trained account risk model is obtained by training the account risk model by using transaction data of a plurality of sample user accounts and the preset transaction amount, wherein the account risk model comprises a GAT model or a graph neural network model;

12. An electronic device comprising a processor, a memory and computer program instructions stored on the memory and executable on the processor, wherein the processor is configured to implement the method of any one of claims 1 to 9 when executing the computer program instructions.

13. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to carry out the method of any one of claims 1 to 9.