Disclosure of Invention
The invention aims to provide a network security management system based on big data, so as to solve the problems in the background technology.
1. Because the traditional account risk assessment relies on a fixed threshold value, and cannot adapt to the network crime fund circulation difference, telecommunication fraud missed judgment and financial fraud misjudgment are caused, the case builds a dynamic threshold value calculation model aiming at different types of network crimes through a dynamic threshold value setting unit, combines the technologies of a hidden Markov model, kernel density estimation, reinforcement learning and the like, adjusts the threshold value according to transaction frequency, case-related amount scale and transaction time span, and a fuzzy comprehensive evaluation unit introduces a fuzzy mathematical algorithm to determine membership functions, distributes weights and calculates account comprehensive risk scores, so that account risks can be accurately evaluated, missed judgment and misjudgment conditions are reduced, the effectiveness of network security management is improved, and normal development of financial business is ensured.
2. Because the conventional method is difficult to process multi-source heterogeneous data related to network crime and influences the accuracy of risk assessment, the data acquisition and integration unit applies DATAX multi-source heterogeneous data cleaning technology, including preliminary cleaning, outlier detection and data restoration modules, the acquisition process has intelligent perception and self-adaptive scheduling characteristics, the deep neural network is utilized to analyze webpage structures, ant colony algorithm scheduling tasks and blockchain technology cache data, high-quality data can be obtained, reliable basis is provided for subsequent risk assessment, and the reliability and stability of the network safety management system are enhanced.
In order to achieve the above object, there is provided a big data based network security management system comprising the following elements:
the data acquisition and integration unit is used for acquiring network crime data from multiple data sources, and processing the network crime data by utilizing DATAX multiple-source heterogeneous data cleaning technology to acquire the fund circulation characteristics of the network crime data;
The dynamic threshold setting unit builds a dynamic threshold calculation model according to the fund circulation characteristics, and dynamically adjusts the threshold according to the transaction frequency, the case-related amount and the transaction time of the network crime data;
the fuzzy comprehensive evaluation unit introduces a fuzzy mathematical algorithm to determine membership functions of the network crime data, respectively distributes weights to transaction frequency, amount concentration and account relevance, calculates comprehensive risk scores of the network crime data accounts through fuzzy transformation, compares the comprehensive risk scores with a set risk threshold, and determines risk accounts according to comparison results;
The machine learning optimizing unit utilizes the historical case data to construct a network crime training data set, a neural network algorithm is selected to train the network crime training data set, a fund penetration threshold judgment model is built, new case data is used for updating the training set at regular intervals, and finally a case-related account in the risk account is judged through the control unit.
As a further improvement of the technical scheme, when the data acquisition and integration unit processes network crime data by utilizing DATAX multi-source heterogeneous data cleaning technology:
Screening out the problem data by using a preliminary cleaning module based on pattern matching and rule constraint, then starting an outlier detection module based on an improved isolated forest algorithm to conduct random subspace division on the feature dimensions of the residual data and constructing an isolated tree, and identifying suspected outlier data by taking the average path length of the data points in the isolated tree as an outlier score;
and finally, repairing the abnormal data by using a data repairing module based on the generation countermeasure network aiming at the abnormal data.
As a further improvement of the technical scheme, the acquisition flow of the data acquisition integration unit has intelligent perception and self-adaptive scheduling characteristics:
in the acquisition source detection stage, a webpage structure analyzer based on a deep neural network is utilized to learn and identify data storage structures and update frequency modes of different types of websites, and an acquisition path planning template is constructed;
When the distributed data source is used for collecting, a task scheduling mechanism based on an ant colony algorithm is adopted to simulate the collection task into an ant foraging path, and the priority of the collection task is dynamically distributed;
And a trusted data caching mechanism based on a blockchain is built in and is used for coping with network fluctuation and data source limitation in the acquisition process, and acquired data fragments are stored in a blockchain block mode.
As a further improvement of the technical scheme, when the dynamic threshold setting unit constructs a dynamic threshold calculation model:
A hidden Markov model is introduced to mine the hidden state of the fund circulation in the network crime data, a hidden state model of the fund circulation is constructed, and then parameters of the hidden state model of the fund circulation are trained through an algorithm;
Calculating the probability density distribution of fund transaction frequency, case-related amount and transaction time in different hidden states from network crime historical transaction data by combining a non-parameter statistical method based on kernel density estimation, and setting an initial threshold range of a dynamic threshold calculation model based on the probability density distribution;
And meanwhile, a threshold adjustment strategy of a dynamic threshold calculation model is optimized by utilizing a Q learning algorithm in the reinforcement learning algorithm, and the accurate early warning rate is used as a dynamic adjustment threshold of the reward signal.
As a further improvement of the technical scheme, the dynamic threshold setting unit dynamically adjusts the threshold according to the real-time data:
The method comprises the steps of performing multi-scale decomposition on funds transaction flow data of network crimes by adopting a time-frequency analysis technology based on wavelet transformation, obtaining transaction frequency and case-related amount, decomposing the transaction frequency and the case-related amount into different frequency sub-bands, capturing short-term mutation and long-term trend in the transaction data, and designing a differential threshold adjustment rule aiming at the different frequency sub-bands.
As a further improvement of the technical scheme, when the fuzzy comprehensive evaluation unit determines the membership function:
the membership function is determined by the transaction frequency, the amount concentration and the account relevance of the network crime data;
Initializing transaction frequency, amount concentration and account relevance by using a fuzzy C-means clustering method driven by a particle swarm optimization algorithm, obtaining 50 initialized particles and constructing fitness functions of each particle, wherein the initialized particles represent different membership function parameter combinations, and then iteratively finding out an optimal membership function for identifying a risk account;
Meanwhile, an uncertainty correction mechanism of the D-S evidence theory is introduced to fuse multiple groups of evidence information about the same risk account acquired from different data sources, and trust weights are dynamically distributed according to the evidence conflict degree and used for evaluating the risk account result.
As a further improvement of the technical scheme, the fuzzy comprehensive evaluation unit assigns a weight link:
Firstly, constructing a judgment matrix by using an analytic hierarchy process based on the number of intervals, carrying out pairwise comparison judgment in the form of the number of intervals for different network crime scenes, determining subjective weights by consistency test and interval number operation, finally, fusing the subjective weights with objective weights based on a combination weighting model of a least square method, and solving an optimal combination coefficient by taking the square sum of deviation of the subjective weights and the objective weights as a target for judging comprehensive risk scores.
As a further improvement of the technical scheme, when the fuzzy comprehensive evaluation unit executes fuzzy transformation to calculate the comprehensive risk score of the account:
and constructing a rule base by using a fuzzy model based on intuitional fuzzy reasoning, and obtaining an intuitional fuzzy evaluation result of the account risk through fuzzy implication operation and synthetic reasoning according to the transaction frequency, the amount concentration and the intuitional fuzzy membership of the account relevance.
Compared with the prior art, the invention has the beneficial effects that:
In the network security management system based on big data, a dynamic threshold setting unit builds a dynamic threshold calculation model, a plurality of algorithms are combined to adjust thresholds according to network crime fund circulation characteristics, the fuzzy comprehensive evaluation unit accurately adapts to differences of different types of crimes, a fuzzy mathematical algorithm is used for accurately determining membership functions and weights, account comprehensive risk scores are calculated, a machine learning optimization unit utilizes a historical data training model and updates and optimizes regularly, accuracy of account judgment related to crimes is improved, and accuracy of account risk assessment related to network crimes is improved.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention provides a network security management system based on big data, please refer to FIG. 1, which comprises the following units:
The data acquisition and integration unit 1 is used for acquiring network crime data from multiple data sources, and processing the network crime data by utilizing DATAX multiple-source heterogeneous data cleaning technology to acquire the fund circulation characteristics of the network crime data;
when the data acquisition and integration unit 1 processes network crime data by utilizing DATAX multi-source heterogeneous data cleaning technology:
the method comprises screening out a large number of data with wrong format, unmatched data types and obviously violating service logic in multi-source heterogeneous data, utilizing a preliminary cleaning module based on pattern matching and rule constraint to screen out problem data, establishing a pre-defined data format template library containing more than 50 common data formats, formulating 20 key service rules, such as that transaction amount cannot be negative number, user age should be in a reasonable range and the like, progressively scanning the acquired multi-source heterogeneous data, matching each data with the data format template and the service rules, screening out data which does not meet requirements, detecting a subsequent abnormal value and restoring the data, reducing burden, starting an abnormal value detection module based on an improved isolated forest algorithm to randomly sub-space division and construct an isolated tree, taking the average path length of data points in the tree as abnormal score, identifying suspected abnormal data, randomly sub-space division of the characteristic dimension of the residual data, constructing a tree with 100 pieces of depth of 15, randomly selecting a characteristic point and a maximum data point, setting the two points in the tree as the peak value of the abnormal value, calculating the peak value of the abnormal value, setting the peak value of the abnormal value as the abnormal value, calculating the peak value, and setting the peak value of the abnormal value in the abnormal value as the abnormal value, and setting the peak value of the abnormal value to be the abnormal value, the method comprises the steps of repairing abnormal data by using a data repairing module based on a generated countermeasure network to construct the generated countermeasure network, wherein a generator adopts a 5-layer convolutional neural network architecture, transaction data in a period of time are arranged into a similar image matrix form, different attributes of the transaction are used as different channels or dimensions of the matrix, transaction time sequences are used as rows or columns of the matrix, the convolutional neural network can conduct characteristic extraction and processing on the abnormal data like image data, advantages of the convolutional neural network in terms of processing multidimensional data are fully utilized to input random noise, repair candidate data similar to original data are output, a 4-layer full-connection network is adopted by a discriminator, the original data and the repair candidate data generated by the generator are input, authenticity of the discriminator is judged, the discriminator and the generator conduct countermeasure training, the aim of the discriminator is to generate repair candidate data capable of cheating the discriminator, the original data and the generated repair candidate data are accurately distinguished, after training is completed, the suspected abnormal data are repaired by using the generator, the data after the repair module is processed, the repair accuracy of the repaired data is greatly improved, and the foundation of data mining is reliably analyzed and the data is provided.
The acquisition flow of the data acquisition integration unit 1 has intelligent perception and self-adaptive scheduling characteristics:
In the acquisition source detection stage, a data storage structure and an update frequency mode of different types of websites are learned and identified by a webpage structure analyzer based on a deep neural network, an acquisition path planning template is constructed, 5 general websites, namely a social platform, an electronic commerce, a financial institution official network, news media and a large number of webpage samples of government websites are collected, the webpages are marked, the marked content comprises information such as data storage positions, data update time and the like, a deep neural network is constructed, the network comprises an input layer, a plurality of hidden layers and an output layer, the input layer receives HTML codes or DOM tree information of the webpages, the hidden layers adopt a mode of combining a convolution layer with a circulation layer, the convolution layer is used for extracting local features of the webpages, the circulation layer is used for processing sequence information of the webpages, the output layer outputs a prediction result of the data storage structure and the update frequency mode of the websites, the prepared marking data is used for training the deep neural network, the model can accurately learn the data storage structure and the update frequency mode of the different types of websites through repeated iterative training, the model can be realized according to the trained prediction result, the acquired data storage structure and the update frequency mode of the trained website, the accurate acquisition template is realized, and the acquisition rule of the acquisition template is realized for the initial position of each type of the website is realized in a rapid acquisition path-oriented mode 20, and the acquisition source is realized;
When the distributed data sources are collected, a task scheduling mechanism based on an ant colony algorithm is adopted to analogize a collection task into an ant foraging path, the collection task priority is dynamically allocated, the collection task is analogized into the ant foraging path, parameters such as the number of ants, a pheromone volatilization coefficient, heuristic factors and the like are defined, 8 key indexes such as data updating activity of the data sources, network bandwidth availability, server response delay and the like are comprehensively considered, a comprehensive score is calculated for each data source, each ant selects one data source as a next collection target according to the current pheromone concentration and the comprehensive score of the data source, after the collection task is completed, the pheromones are released on the path, the pheromones are successfully collected into the path of valuable data after the pheromones are volatilized, the pheromones on the path of the non-collected valuable data are reduced, the comprehensive scores of the pheromone concentration and the data sources are continuously updated, the waiting time and the data redundancy in the collection process are dynamically adjusted, and the waiting time and the data redundancy in the collection process are reduced;
The system is characterized in that a trusted data caching mechanism based on a blockchain is built in and is used for coping with network fluctuation and data source limitation in the acquisition process, acquired data fragments are stored in the form of blockchains, acquired high-value data fragments are stored in the form of blockchains, each block contains information such as data content, a timestamp, a hash pointer and the like, the hash pointer is used for linking the previous block to form a chain structure of the blockchain, the non-tampering and traceability of data are ensured, a cache area is arranged for each data source, when new data is acquired, the new data is added into the corresponding cache area, meanwhile, the cache area is cleaned periodically, expired or useless data is deleted, when the acquisition interruption is caused by the network fluctuation or the data source limitation, the stored data is read from the cache area, the acquisition or analysis is continued, and the continuity of the data is ensured.
The dynamic threshold setting unit 2 constructs a dynamic threshold calculation model according to the fund circulation characteristics, and dynamically adjusts the threshold according to the transaction frequency, the case-related amount and the transaction time of the network crime data;
when the dynamic threshold setting unit 2 constructs the dynamic threshold calculation model:
The hidden Markov model is introduced to mine the hidden state of the fund circulation in the network crime data, the hidden state model of the fund circulation is constructed, the hidden Markov model containing 5-8 hidden states is respectively constructed through algorithm training model parameters, and each hidden Markov model is distributed by the probability of the initial state State transition matrixAnd an observation probability matrixThe method comprises the steps of collecting a large amount of fund circulation data related to each network crime type, taking information such as frequency, amount and time of fund transaction as an observation sequence, and training a hidden Markov model by using an algorithm, wherein the algorithm is an iterative expectation maximization algorithm, and comprises the following specific steps of:
initializing model parameters 、AndCalculating posterior probability of each hidden state sequence under the observation sequence according to the current model parameters, and updating the model parameters according to the obtained posterior probability、AndProviding more accurate basis for the subsequent threshold setting;
The method comprises the steps of combining a non-parameter statistical method based on kernel density estimation to estimate the probability density distribution of fund transaction frequency, case-related amount scale and transaction time span under different states from historical transaction data, setting an initial threshold range based on the probability density distribution, extracting data of the fund transaction frequency, case-related amount scale and transaction time span under different network crime types from massive historical transaction data, selecting a Gaussian kernel function and bandwidth parameters to perform kernel density estimation on the data of each feature (the transaction frequency, the case-related amount scale and the transaction time span), wherein the formula of the kernel density estimation is that WhereinIs a function of the estimated probability density,Is the number of data points and,Is a bandwidth parameter that is used to determine the bandwidth,Is a kernel function that is used to determine the function,Is the firstThe data point is used for determining an initial threshold range of each feature according to probability density distribution obtained by kernel density estimation, for example, the quantile (such as 90% quantile and 95% quantile) of the distribution can be selected as a threshold value, and complex actual data can be better adapted without depending on specific parameter distribution assumptions, so that the rationality and accuracy of initial threshold value setting are improved;
Meanwhile, a Q learning algorithm in a reinforcement learning algorithm is utilized to optimize a threshold adjustment strategy, the current fund transaction data characteristics (transaction frequency, case-related amount scale and transaction time span) and the prediction result of a model (whether suspicious transaction is or not) are taken as states, a series of threshold adjustment actions are defined, such as increasing the threshold, reducing the threshold, keeping the threshold unchanged and the like, the accurate early warning rate is taken as a reward signal, +10 rewards are given for each successful identification of potential network criminal behaviors, 5 penalties are given for misjudgment, a Q table is created for storing the Q value of each state-action pair, the Q value is set to 0 at the beginning, and at each time step, the Q value is set to be 0 according to the current state Selecting an actionBy usingGreedy strategyTo randomly select an action toSelecting an action with the largest Q value and executing the actionObtaining the next stateAnd rewardsAnd updating the Q value in the Q table, repeating the steps until the Q table is converged, dynamically adjusting the threshold according to the real-time network crime situation, improving the adaptability and the accuracy of the system, and reducing the misjudgment rate.
The dynamic threshold setting unit 2 adjusts the threshold according to the real-time data:
Performing multi-scale decomposition on the fund transaction stream data by adopting a time-frequency analysis technology based on wavelet transformation, decomposing transaction frequency and case-related amount into sub-bands with different frequencies, capturing short-term mutation and long-term trend in the transaction data, designing a differential threshold adjustment rule aiming at different frequency components, preprocessing the fund transaction stream data acquired in real time, selecting wavelet basis functions according to the characteristics of the fund transaction data, wherein the different wavelet basis functions have different characteristics, daubechies wavelet basis has tight support and orthogonality, analyzing signals with mutation characteristics, performing multi-scale decomposition on the preprocessed transaction frequency and case-related amount data by using the selected wavelet basis functions, decomposing the data into approximate components and detail components of the sub-bands with different frequencies, for example, performing 3-layer decomposition to obtain the approximate components with low frequencies Detail component of high frequencyWherein the approximate components reflect the long-term trend of the data, the detail components reflect the short-term change of the data, abnormal change points in the data can be more accurately identified through multi-scale decomposition of wavelet transformation, a more accurate basis is provided for subsequent threshold adjustment, analysis is carried out on the high-frequency detail components obtained through analysis, short-term mutation is detected through setting a certain threshold, for example, standard deviation of each detail component is calculated, when the detail component value at a certain moment exceeds 3 times of the standard deviation, the short-term mutation point is judged, trend analysis is carried out on the low-frequency approximate components, the change trend of the approximate components can be fitted by using methods such as linear regression, and the like, so that the long-term change direction and rate of the data can be obtained, the threshold adjustment is more fit with the actual transaction condition, the early warning accuracy of the system is improved, for short-term mutation detected by high-frequency detail components, if normal transaction peaks (such as sales promotion activities, transaction increases caused by holidays and the like) are judged, a gradual change coefficient strategy is adopted, for example, a threshold value is only increased by 5% -10% within 1 hour so as to avoid misjudgment caused by normal short-term fluctuation, if abnormal fluctuation (such as abnormal large-scale transactions, frequent small-scale transactions and the like) is judged, the threshold value is instantaneously increased by 50% -80% so as to enhance the sensitivity of a system to abnormal behaviors, for long-term trends reflected by low-frequency approximate components, the threshold value adjustment step length is dynamically adjusted according to trend slopes, when the trend slopes are larger than 0.3, the transaction data is indicated to show obvious rising trend, the threshold value is increased by 15% -25% every 6 hours, when the trend slopes are smaller than-0.3, the transaction data is indicated to show obvious descending trend, the threshold value is reduced by 10% -20% every 6 hours, when the trend slope is between-0.3 and 0.3, the transaction data is considered to be relatively stable, the threshold value is kept unchanged, and the system can better cope with the change of different types of transaction data through a differentiated threshold value adjustment rule, so that the misjudgment rate is effectively reduced.
The fuzzy comprehensive evaluation unit 3 introduces a fuzzy mathematical algorithm to determine membership functions of the network crime data, respectively distributes weights to transaction frequency, amount concentration and account relevance, calculates comprehensive risk scores of the network crime data accounts through fuzzy transformation, compares the comprehensive risk scores with a set risk threshold, and determines risk accounts according to comparison results;
When the fuzzy comprehensive evaluation unit 3 determines the membership function:
Initializing 50 particles representing different membership function parameter combinations according to transaction frequency, amount concentration and account relevance by using a fuzzy C-means clustering method driven by a particle swarm optimization algorithm, constructing an adaptability function of each particle, and then carrying out iteration to find out an optimal membership function for identifying account risks;
collecting data such as transaction frequency, amount concentration, account relevance and the like, taking the data as input data, initializing 50 particles, wherein each particle represents different membership function parameter combinations, the parameters are used for describing a clustering center and a membership matrix in fuzzy C-means clustering, constructing an adaptability function of each particle, taking intra-class compactness and inter-class separation as indexes, measuring the tightness between data points of the same class by the intra-class compactness, measuring the distance between data points of different classes by the inter-class separation, and designing an expression of the adaptability function as follows WhereinIs a weight coefficient for balancing the importance of the intra-class compactness and the inter-class separation,For the degree of tightness within the class,In order to achieve a degree of separation between classes,In order to adapt the degree function, the particle swarm algorithm starts iteration, each particle searches in the solution space according to the speed and position information of the particle, the particle updates the speed and position of the particle in each iteration, after 200 iterations, the algorithm is terminated, and the membership function parameter combination corresponding to the global optimal position obtained at the moment is the optimal membership function, so that potential risk accounts can be identified more accurately;
Meanwhile, an uncertainty correction mechanism of a D-S evidence theory is introduced to fuse multiple groups of evidence information about the same account acquired from different data sources, and trust weights are dynamically distributed according to the evidence conflict degree and used for evaluating account risk results;
acquiring multiple groups of evidence information about the same account from different data sources (such as transaction records, account behavior logs, social network information and the like), performing basic probability distribution on each group of evidence information, determining the support degree of each evidence on different risk levels (such as low risk, medium risk and high risk), calculating the conflict degree between different evidences, and adopting a conflict degree calculation formula as follows Wherein, the method comprises the steps of,In order to achieve a degree of conflict,AndIs the basic probability distribution function of two different pieces of evidence,AndThe method is different propositions, trust weights are dynamically distributed according to the evidence conflict degree, when the evidence conflict degree is higher, the trust weights of conflict evidence are reduced, when the evidence conflict degree is lower, the trust weights of the evidence are improved, a plurality of groups of evidence are fused by using a synthesis rule of a D-S evidence theory, a final account risk assessment result is obtained, uncertainty and conflict in multi-source evidence information are effectively processed, information of different data sources is comprehensively considered, and the credibility of the account risk assessment result is improved.
The fuzzy comprehensive evaluation unit 3 assigns weight links:
Firstly, constructing a judgment matrix by using a hierarchical analysis method based on the number of intervals, carrying out comparison and judgment in the form of the number of intervals for different network crime scenes, determining subjective weight by consistency test and interval number operation, carrying out comparison and judgment in the form of the number of intervals for each factor, and giving out comparison results, such as factors Relative to the factorsIs of importance betweenTo the point ofBetween, is denoted asThereby constructing an interval number judgment matrixWhereinConsistency test is carried out on the interval number judgment matrix to ensure the judgment rationality, and the consistency index of the interval number judgment matrix is calculatedAnd random consistency ratioWhen (when)When the consistency is checked, the interval number sorting method, such as a probability sorting method, is adopted to determine the subjective weight interval of each factorWhereinAs the minimum of subjective weights for each factor,The subjective weight of each factor is maximum, and the intervals are processed, and the final subjective weight can be obtained by adopting methods such as median value taking and the likeThe subjective weight can better reflect the judgment of the expert on the importance of each factor based on experience and knowledge, the reliability of weight distribution is improved, finally, the subjective weight and the objective weight are fused based on a combination weighting model of a least square method, the optimal combination coefficient is solved by taking the square sum of deviation of the subjective weight and the objective weight as a target, the optimal combination coefficient is used for judging comprehensive risk scores, the original data is standardized, the influence of the data dimension of different factors is eliminated, the information granularity concept is introduced, the traditional entropy weight calculation is improved, and the method is used for the first step of calculation of the traditional entropy weightIndividual factors, its information entropyThe calculation formula is adjusted after the information granularity is considered, and can be determined according to an actual improvement method, and objective weights of all factors are calculated according to the information entropyThe formula isWherein, the method comprises the steps of,As a total number of factors,As an objective weight for each of the factors,Is the firstThe information entropy of each factor, the objective weight provides a data-driven basis for determining the comprehensive weight, the objectivity of weight distribution is enhanced, and the combined weight is set asWhereinTo combine coefficients, an objective function is constructed;Is a function of the object to be measured,For final subjective weighting, the goal is to makeMinimum, to objective functionConcerningAnd let the derivative be 0, solving to obtain the optimal combination coefficientCombining the optimal coefficientsSubstituting the combined weight formulaObtaining the comprehensive weight of each factorSubjective experience and objective data are comprehensively considered, and the actual condition of the data is reflected by the comprehensive weight through optimizing the combination coefficient.
When the fuzzy comprehensive evaluation unit 3 performs fuzzy transformation to calculate an account comprehensive risk score:
A rule base is constructed by using a fuzzy model based on intuitional fuzzy reasoning, transaction frequency, amount concentration and account relevance are used as input variables, account risk levels are used as output variables, the account risk levels can be divided into a plurality of levels of low risk, medium risk, high risk and the like, corresponding intuitional fuzzy sets are defined for each input variable and output variable, for example, for the transaction frequency, intuitional fuzzy sets such as ' low frequency ', ' medium frequency ', ' high frequency ' and the like can be defined, each intuitional fuzzy set is described by membership degree, non-membership degree and hesitation degree, a rule base comprising 10 fuzzy rules is constructed according to historical data, the general form of the rule is ' if (transaction frequency is A) and (account relevance is C) ', then (account risk is D) ', A, B, C, D is intuitional fuzzy sets of corresponding dependent variables respectively, compared with the conventional rule base, characteristics of the risk assessment of the account can be captured more accurately, more comprehensive basis is provided for subsequent intuitional fuzzy sets, the account membership degree, the intuitional fuzzy rule base is calculated according to the transaction frequency, the account relevance is integrated with the account relevance is calculated by the fuzzy rule base, and the result is calculated by the fuzzy rule of the fuzzy rule, and the fuzzy rule is calculated according to the intuitional fuzzy rule, and the result is calculated by the fuzzy rule of the account relevance is calculated by the fuzzy rule, and the fuzzy rule is calculated by the fuzzy rule of the fuzzy rule, and the fuzzy rule is corresponding to the fuzzy rule of the account risk is calculated, the result comprises membership, non-membership and hesitation of the account risk under various risk levels so as to reflect the risk condition of the account more truly, improve the accuracy of risk assessment, represent the intuitive fuzzy evaluation result of the account risk in the form of an intuitive fuzzy set, for example, give out the membership, non-membership and hesitation of the account risk for three levels of low risk, medium risk and high risk respectively, intuitively know the possibility and the existing uncertainty of the account under different risk levels according to the intuitive fuzzy evaluation result, for example, if the membership of the account under the high risk level is higher and the hesitation is lower, indicate that the account is likely to be in the high risk state, enable the account risk assessment result to be more visual and easy to understand, be helpful for a decision maker to formulate a corresponding risk coping strategy according to actual conditions, and improve the practicability of risk assessment.
The machine learning optimizing unit 4 utilizes the historical case data to construct a network crime training data set, selects a neural network algorithm to train the network crime training data set, establishes a fund penetration threshold judgment model, then periodically uses new case data to update the training set, and finally judges a case-related account in the risk account through the control unit 5;
Collecting data related to funds penetration from historical case files, databases and other channels, covering various information such as transaction amount, transaction time, transaction objects, account association relations and the like, cleaning, extracting characteristics which have important influence on funds penetration judgment such as transaction frequency, concentration of funds flow direction, account liveness and the like from the cleaned data, labeling each piece of data with a label of whether the funds penetration is involved according to the final judgment result of the historical case, wherein 0 represents a non-involved case, 1 represents a involved case, selecting a neural network architecture, initializing parameters of the neural network including weights and biases, dividing a training data set into a training set and a verification set, and training a model by using the training set. In the training process, the model is calculated by forward propagation, the model is compared with a real label, the error is calculated by using a loss function, then the parameters of the model are updated by using a reverse propagation algorithm, the performance of the model is continuously optimized, the trained model is evaluated by using a verification set, the parameters and super parameters of the model are adjusted according to the evaluation result until the model achieves better performance, new case data are regularly collected and integrated with an original training data set, preprocessing operations such as cleaning, feature extraction, label marking and the like are carried out on the newly added data, the format and the feature of the new data are consistent with those of the original training data set are ensured, the model after retraining is evaluated by using the new verification set, according to the evaluation result, the parameters and super parameters of the model are adjusted to ensure that the performance of the model is improved, the related data of the account to be judged is input into a trained fund penetration threshold judgment model, including transaction records, account related information and the like, the model predicts according to the input data, whether the account is a case-involved account or not or a judgment result is output, a control unit 5 judges that the account is a case-involved account according to the output result of the model and a preset judgment rule, for example, if the case-involved probability output by the model exceeds 80%, the account is judged to be a case-involved account, if the probability is lower than 20%, the account with the probability between 20% and 80% can be further subjected to manual auditing or judgment by adopting other auxiliary means, the judgment result is output and corresponding processing is performed, if operations such as freezing and monitoring are carried out on the case-related account, the judgment result and the related information are recorded at the same time, so that subsequent audit and analysis can be conveniently carried out, the case-related account can be timely and accurately judged, and the fund penetration risk is effectively prevented.
In the invention, the data acquisition and integration unit 1 is used for acquiring and cleaning multi-source data, the acquisition process has intelligent characteristics and adopts blockchain cache data, the dynamic threshold setting unit 2 builds a model adjustment threshold according to different network crime types by combining a hidden Markov model, a kernel density estimation and reinforcement learning algorithm, the fuzzy comprehensive evaluation unit 3 determines membership functions and assigns weights by using a fuzzy mathematical algorithm, calculates comprehensive risk scores of accounts, the machine learning optimization unit 4 trains and updates the model by using historical data, helps to judge case-related accounts, effectively solves the problem of missed judgment and misjudgment caused by fixed thresholds in the traditional account risk assessment, and improves the accuracy of network crime account risk assessment.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the above-described embodiments, and that the above-described embodiments and descriptions are only preferred embodiments of the present invention, and are not intended to limit the invention, and that various changes and modifications may be made therein without departing from the spirit and scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.