Disclosure of Invention
The invention aims at: in order to solve the problem that the current financial wind control big data system mainly relies on financial data to carry out risk control management, and has relatively weak supervision capability and defects in comprehensiveness for users with fewer financial activities, the invention provides a financial wind control method based on big data.
The invention adopts the following technical scheme for realizing the purposes:
a financial wind control method based on big data comprises the following steps:
S1, collecting data, namely collecting user data from a social network platform;
S2, data preprocessing, namely cleaning the collected data, removing irrelevant information, and processing missing values and abnormal values. Then, mapping the user data into nodes and edges in the graph database, and ensuring that the data is converted into a format required by the graph database;
s3, establishing a graph database, selecting a graph database system, and importing the preprocessed data into the graph database, wherein each user can be represented as a node in the graph database, and the relationship among the users is represented as the edges among the nodes;
S4, risk analysis, namely analyzing a network structure in a graph database by using a complex network analysis technology, calculating a centrality index of the network, and identifying key nodes and potential community structures;
S5, establishing a risk prediction model, and establishing a financial risk prediction model by using a multiple linear regression algorithm based on the complex network analysis result and combining financial risk indexes (such as credit scores, transaction behaviors, historical default records and the like), wherein the financial risk indexes are used as dependent variables, and key nodes and community structure indexes are used as independent variables;
S6, an early warning mechanism is established, the early warning mechanism is established according to a risk prediction model, when the network structure change or the user behavior mode is detected, the network structure change or the user behavior mode after the change is calculated according to the risk prediction model, a corresponding predicted financial risk index is output, and when the financial risk index is in a risk range, the early warning system can timely send out a risk warning;
and S7, early warning implementation, namely integrating an early warning mechanism into a financial risk management platform, monitoring social network data and user behaviors in real time, and automatically notifying related personnel to take measures by the system once an early warning condition is triggered.
Further, the user data comprises user basic information and user behavior data, the user behavior data comprises friend relations and interaction behaviors among users, and the interaction behaviors comprise comment, forwarding, praise, sharing and publishing of content.
Further, the graph database system selection ArangoDB, arangoDB can build a multi-model database for distributed graphics data.
Further, the centrality index includes centrality, proximity centrality and intermediacy, the calculating the centrality index of the network includes counting the number of connections (i.e., the number of edges) of each node, calculating the proximity centrality by measuring an average shortest path length of one node to all other nodes, and calculating the intermediacy by measuring the frequency of occurrence of one node in all shortest paths, and the calculating the centrality index of the network is calculated using NetworkX graph analysis library provided by the graph database.
Further, the identifying key nodes and potential community structures includes the following steps:
A1, identifying key nodes according to the ranking of centrality indexes (centrality, near centrality and intermediacy centrality);
A2, community structure discovery, namely identifying a community structure in a network by using Louva i n community detection algorithm, and finding out densely connected node groups in the network.
Further, the establishing the financial risk prediction model by using the multiple linear regression algorithm comprises the following steps:
B1, collecting data, namely collecting user historical data including credit scores, transaction behaviors, historical violation records and key node and community structure data obtained through complex network analysis;
B2, data preprocessing: cleaning data, processing missing values and abnormal values, and carrying out data standardization or normalization to ensure data quality;
b3, feature selection: selecting one or more features related to financial risk from the collected data, including key node indicators and community structure indicators in the network analysis;
b4, constructing a model; constructing a model by using a multiple linear regression algorithm, wherein credit scores, transaction behaviors and historical violations are used as dependent variables, and key nodes and community structure indexes are used as independent variables;
B5, training a model, namely training the model by using historical data, and estimating model parameters, namely regression coefficients;
b6, evaluating the model, namely evaluating the performance of the model through cross validation, R square value adjustment, mean Square Error (MSE) and Root Mean Square Error (RMSE) statistical indexes;
b7, model optimization, namely adjusting model parameters according to an evaluation result, adding or deleting characteristics, and preventing overfitting by using LASSO function regularization;
and B8, deploying the model, namely deploying the optimized model into a financial wind control system.
Further, the LASSO function expression is as follows:
J(θ)=1/2n(Xθ-Y)T(Xθ-Y)+α||θ||1
where n is the number of samples, α is a constant coefficient, and θ 1 is the L1 norm.
The beneficial effects of the invention are as follows:
1. According to the risk identification method, the comprehensiveness of risk identification is improved by combining social network data and financial data, and the intelligent level of risk control is improved by introducing a complex network analysis technology. Meanwhile, real-time risk monitoring and early warning are realized, the operation cost is reduced, and decision support is enhanced. In general, the present invention provides a more efficient and accurate risk control solution for financial institutions.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
Referring to fig. 1, the invention provides a financial wind control method based on big data, which comprises the following steps:
S1, collecting data, namely collecting user data from a social network platform;
S2, data preprocessing, namely cleaning the collected data, removing irrelevant information, and processing missing values and abnormal values. Then, mapping the user data into nodes and edges in the graph database, and ensuring that the data is converted into a format required by the graph database;
s3, establishing a graph database, selecting a graph database system, and importing the preprocessed data into the graph database, wherein each user can be represented as a node in the graph database, and the relationship among the users is represented as the edges among the nodes;
S4, risk analysis, namely analyzing a network structure in a graph database by using a complex network analysis technology, calculating a centrality index of the network, and identifying key nodes and potential community structures;
s5, establishing a risk prediction model, namely, based on the result of complex network analysis, combining financial risk indexes (such as credit scores, transaction behaviors, historical default records and the like), acquiring data from multiple channels of financial institution internal systems, public data sets, third party service providers and the like by using a multiple linear regression algorithm, and establishing the financial risk prediction model, wherein the financial risk indexes are used as dependent variables, and key nodes and community structure indexes are used as independent variables;
S6, an early warning mechanism is established, the early warning mechanism is established according to a risk prediction model, when the network structure change or the user behavior mode is detected, the network structure change or the user behavior mode after the change is calculated according to the risk prediction model, a corresponding predicted financial risk index is output, and when the financial risk index is in a risk range, the early warning system can timely send out a risk warning; risk ranges include credit reduction to bad and transaction violations, etc.;
and S7, early warning implementation, namely integrating an early warning mechanism into a financial risk management platform, monitoring social network data and user behaviors in real time, and automatically notifying related personnel to take measures by the system once an early warning condition is triggered.
The invention has the following steps when in use:
1. Improving the comprehensiveness of risk identification: by combining the social network data and the financial data, the invention can carry out more comprehensive risk assessment on users with fewer financial activities, and makes up the defects of the traditional financial wind control system.
2. Promote intelligent degree of risk control: based on a complex network analysis technology, the method can identify the key nodes and the potential community structure, so that financial risks can be estimated more accurately, and the intelligent level of risk control is improved.
3. Real-time risk monitoring and early warning: the invention integrates the early warning mechanism into the financial risk management platform, can monitor social network data and user behaviors in real time, discover and early warn potential risks in time, and effectively reduce the occurrence probability of financial risk events.
4. The operation cost is reduced: through automatic early warning mechanism, financial institution can reduce the needs of manual monitoring, reduces the running cost, improves the efficiency of risk control simultaneously.
5. Enhancing decision support: the risk prediction model can provide more scientific and accurate risk assessment results for financial institutions, and help decision makers to formulate more reasonable credit policies and risk control strategies.
In summary, the financial wind control method based on big data provided by the invention not only can improve the comprehensiveness and the intelligent degree of risk identification, but also can realize real-time risk monitoring and early warning, reduce the operation cost and enhance the decision support, thereby providing a more efficient and accurate risk control solution for financial institutions.
In this embodiment, preferably, the user data includes user basic information and user behavior data, where the user behavior data includes a friend relationship between users and an interaction behavior, and the interaction behavior includes comment, forwarding, praying, sharing and publishing content.
In this embodiment, preferably, the graph database system selection ArangoDB, arangoDB can build a multi-model database for distributed graph data, and ArangoDB graph database system allows not only to use one database for all operations, but also allows you to run special queries on data stored in different models, can minimize network jump problems, and can run queries efficiently even for distributed graph data.
In this embodiment, preferably, the centrality index includes centrality, proximity centrality and intermediacy, the calculating the centrality index of the network includes calculating the centrality by counting the number of connections (i.e., the number of edges) of each node, calculating the proximity centrality by measuring an average shortest path length from one node to all other nodes, and calculating the intermediacy index by measuring the frequency of occurrence of one node in all shortest paths, and the calculating the centrality index of the network is calculated using NetworkX graph analysis library provided by the graph database.
Referring to fig. 2, in this embodiment, preferably, the identifying key nodes and potential community structures includes the following steps:
A1, identifying key nodes according to the ranking of centrality indexes (centrality, near centrality and intermediacy centrality);
A2, community structure discovery, namely identifying a community structure in a network by using Louva i n community detection algorithm, and finding out densely connected node groups in the network.
Python code examples identifying key nodes and potential community structures:
```python
import commun ity as commun ity_l ouva i n
part i t i on=commun ity_l ouva i n.best_part it i on(G)
for ii n range(l en(part it i on)):
pr i nt("Node{}be l ongs to commun ity{}".format(i,part i t i on[i]))
```
referring to fig. 3, in this embodiment, preferably, the method for establishing the financial risk prediction model by using the multiple linear regression algorithm includes the following steps:
B1, collecting data, namely collecting user historical data including credit scores, transaction behaviors, historical violation records and key node and community structure data obtained through complex network analysis;
B2, data preprocessing: cleaning data, processing missing values and abnormal values, and carrying out data standardization or normalization to ensure data quality;
b3, feature selection: selecting one or more features related to financial risk from the collected data, including key node indicators and community structure indicators in the network analysis;
b4, constructing a model; constructing a model by using a multiple linear regression algorithm, wherein credit scores, transaction behaviors and historical violations are used as dependent variables, and key nodes and community structure indexes are used as independent variables;
B5, training a model, namely training the model by using historical data, and estimating model parameters, namely regression coefficients;
b6, evaluating the model, namely evaluating the performance of the model through cross validation, R square value adjustment, mean Square Error (MSE) and Root Mean Square Error (RMSE) statistical indexes;
b7, model optimization, namely adjusting model parameters according to an evaluation result, adding or deleting characteristics, and preventing overfitting by using LASSO function regularization;
and B8, deploying the model, namely deploying the optimized model into a financial wind control system.
In this embodiment, preferably, the expression of the LASSO function is as follows:
J(θ)=1/2n(Xθ-Y)T(Xθ-Y)+α||θ||1
where n is the number of samples, α is a constant coefficient, and θ 1 is the L1 norm.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.