Disclosure of Invention
The invention aims to provide an on-line power grid fault identification system based on big data so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the technical scheme that the power grid fault online identification system based on big data is characterized by comprising a data collection module, a feature extraction module and a feature analysis module, wherein:
The data collection module collects real-time power grid operation data;
the characteristic extraction module obtains the topological structure of the power grid according to the real-time power grid operation data by using a graph theory algorithm, and performs wavelet transformation on the real-time power grid operation data by using a wavelet transformation technology to generate power time-frequency characteristics;
the feature analysis module predicts whether a fault occurs or not according to the time-frequency features of the power by using a random forest algorithm, wherein:
If the predicted result is a fault, determining the overall fault occurrence position according to the power time-frequency characteristic and the power grid topological structure, analyzing the importance of each characteristic to fault judgment according to the characteristic importance evaluation in the random forest, selecting the characteristic with the highest importance as a key fault characteristic, determining the key fault occurrence position according to the key fault characteristic and the power grid topological structure, and judging the inclusion relation between the overall fault occurrence position and the key fault occurrence position, wherein the method comprises the following steps:
if the overall fault occurrence position comprises a key fault occurrence position, determining the key fault occurrence position and the key fault characteristics as final fault identification results;
if the overall fault occurrence position does not contain the key fault occurrence position, selecting the feature with the next highest importance as the key fault feature according to the importance, and carrying out the determination of the key fault occurrence position and the determination of the inclusion relation again until the key fault occurrence position is determined to be within the overall fault occurrence position;
If the predicted result is normal, the system is normally operated.
As a further improvement of the present technical solution, the data collection module collects real-time grid operation data, including but not limited to current, voltage and frequency, and static structure data of the grid.
As a further improvement of the technical scheme, the feature extraction module includes a topology analysis unit, and the topology analysis unit obtains a topology structure of the power grid according to real-time power grid operation data by using a graph theory algorithm, and specifically includes:
According to static structure data in real-time power grid operation data, the power grid is represented as a graph, which is marked as G and contains a node V and a side E, wherein the node V represents equipment in the power grid, and the side E represents a connecting line between the equipment;
for each edge E in graph G, weights are assigned based on electrical parameters including, but not limited to, resistance of the line, voltage class;
Traversing the whole graph G by using a preferential search algorithm in graph theory, and obtaining a connection matrix C through traversal, wherein the connection relation between nodes is represented by C [ i, j ] =1 if a node i is directly connected with a node j, or C [ i, j ] =0;
calculating different characteristics of the graph, generating a topological structure of the power grid, and enabling the degree distribution D (k) to represent the proportion of nodes with specific connection numbers in the whole graph, wherein D (k) =the number of the nodes with the degree of k/the total number of the nodes.
As a further improvement of the technical scheme, the feature extraction module includes a time-frequency analysis unit, and the time-frequency analysis unit performs wavelet transformation on real-time power grid operation data by using wavelet transformation technology to generate power time-frequency features, and specifically includes:
And (3) for each real-time power grid operation data x (t), carrying out continuous wavelet transformation, adjusting a wavelet basis function psi (t) through a scale parameter a and a translation parameter b, carrying out convolution calculation with x (t), obtaining a wavelet coefficient W (a, b), and extracting energy characteristics under different scales from the wavelet coefficient as power time-frequency characteristics.
As a further improvement of the technical scheme, the feature analysis module comprises a prediction unit, wherein the prediction unit predicts whether a fault occurs according to the time-frequency feature of the electric power by using a random forest algorithm, and specifically comprises the following steps:
Training a random forest model, namely using a historical data set which contains label data of known faults and normal operation, extracting characteristics of a wavelet transformation technology from the historical data set and training the random forest model, wherein the random forest consists of a plurality of decision trees, randomly extracting subsamples from the training data set and constructing each decision tree in the training process, and randomly selecting partial characteristics for splitting nodes in the construction process for each decision tree;
The method comprises the steps of performing fault prediction, namely inputting power time-frequency characteristics into a trained random forest model, independently predicting input characteristic vectors by each decision tree in the random forest, outputting a prediction result, aggregating the prediction results of all the decision trees, and obtaining a final fault prediction result by adopting a majority voting mechanism.
As a further improvement of the technical scheme, the construction process of the decision number is as follows:
A subset is randomly extracted from the historical data set and used for constructing a single decision tree, a feature subset is randomly selected for splitting of each node, and a feature and a splitting point are selected at each node, so that the splitting point can minimize the non-purity before and after splitting, and splitting conditions for measuring the non-purity include, but are not limited to, a coefficient of kunning and information entropy.
As a further improvement of the present technical solution, the feature analysis module includes a decision unit, where the decision unit makes decisions of different paths according to a failure prediction result, and specifically includes:
If the prediction result is normal, the system is normally operated;
If the predicted result is a fault, determining the overall fault occurrence position according to the power time-frequency characteristic and the power grid topological structure, analyzing the importance of each characteristic to fault judgment according to the characteristic importance evaluation in the random forest, selecting the characteristic with the highest importance as a key fault characteristic, determining the key fault occurrence position according to the key fault characteristic and the power grid topological structure, and judging the inclusion relation between the overall fault occurrence position and the key fault occurrence position, wherein the method comprises the following steps:
if the overall fault occurrence position comprises a key fault occurrence position, determining the key fault occurrence position and the key fault characteristics as final fault identification results;
If the overall fault occurrence position does not contain the key fault occurrence position, selecting the feature with the next highest importance as the key fault feature according to the importance, and carrying out the determination of the key fault occurrence position and the determination of the inclusion relation again until the key fault occurrence position is determined to be within the overall fault occurrence position.
As a further improvement of the technical scheme, the determining the overall fault occurrence position according to the power time-frequency characteristic and the power grid topological structure specifically includes:
Collecting power time-frequency characteristics in normal operation from a prediction unit, calculating the mean value mu and the standard deviation sigma of the power time-frequency characteristics, and determining a threshold value of the power time-frequency characteristics according to a formula E=mu+k sigma, wherein E is the power time-frequency characteristic threshold value, and k is a multiple factor for reflecting the selection of abnormal sensitivity;
Comparing the power time-frequency characteristic corresponding to all nodes in the power grid topological structure with a power time-frequency characteristic threshold, and taking the node larger than the power time-frequency characteristic threshold as an abnormal node, wherein for each pair of abnormal nodes vm and vn, if C [ m, n ] =1, the abnormal nodes are directly connected;
Calculating the average shortest path length Li from each abnormal node vi to all other abnormal nodes, wherein the fault source candidate vs=argmin (Li), vi epsilon A, wherein A is an abnormal node set, and argmax is used for finding out the independent variable of the minimum value obtained by the function;
And constructing a minimum connected subgraph Gs by taking vs as a center, so that the Gs contains all abnormal nodes, and the node set in the Gs is the integral fault occurrence position.
As a further improvement of the technical scheme, according to the feature importance assessment in the random forest, the importance of each feature to fault judgment is analyzed, and the feature with the highest importance is selected as the key fault feature, which specifically comprises:
Obtaining, for each split, an impure reduction resulting from the split, the impure reduction being considered as the importance of the feature at the node;
For each tree, accumulating the importance of each feature on all nodes to obtain the total importance of each feature in a single tree;
calculating the average importance of each feature in all decision trees, and for each feature, averaging its importance in all N trees, specifically dividing the total importance of each feature in a single tree by the total number of decision trees N;
and sorting all the features according to the average importance, generating an importance sorting list, and selecting the feature with the largest average importance as the key fault feature.
As a further improvement of the technical scheme, the determination process of the occurrence position of the key fault is carried out again as follows:
According to the importance, selecting the feature with the next highest importance as the key fault feature, and particularly selecting the next feature of the current key fault feature as the key fault feature according to the importance sorting list.
Compared with the prior art, the invention has the beneficial effects that:
1. The random forest algorithm in the power grid fault online identification system based on big data not only can be used for carrying out fault prediction, but also can be used for determining key fault characteristics according to characteristic importance evaluation. In the fault prediction process, the random forest model independently predicts the time-frequency characteristics of the electric power by integrating a plurality of decision trees, and obtains a final fault prediction result by adopting a majority voting mechanism, so that the reliability and the accuracy of prediction are improved, furthermore, the random forest model also evaluates the importance of each characteristic and selects the characteristic which is most critical to fault judgment as a key fault characteristic, if the integral fault occurrence position does not contain the key fault occurrence position, the system can select the next highest importance characteristic to carry out fault positioning and judging the inclusion relation again according to the importance of the characteristic until the key fault occurrence position is determined to be within the integral fault occurrence position. The process not only improves the precision of fault identification by multiple application of random forest algorithms, but also enables the system to have higher flexibility and robustness when facing to complex power grid faults.
2. The on-line power grid fault identification system based on big data determines the overall fault occurrence position according to the power time-frequency characteristics and the power grid topological structure, and the key fault occurrence position is determined by the key fault characteristics and the power grid topological structure, so that faults can be more accurately positioned and analyzed through multiple applications of the power grid topological structure, the power grid topological structure utilizes a graph theory algorithm to represent all equipment and connecting lines in the power grid into graphs, and generates a connection matrix and degree distribution through traversal and analysis, the topological characteristics not only help to identify key nodes and connection relations in the power grid, but also provide data support for fault path analysis and shortest path calculation through calculation, and abnormal nodes and normal nodes can be effectively distinguished by combining threshold judgment of the power time-frequency characteristics. On the basis, the key fault feature node is compared with the integral fault occurrence position, the key fault occurrence position can be accurately determined, the inclusion relationship between the key fault feature node and the integral fault occurrence position can be further analyzed, and the method for the multiple application of the power grid topological structure enables the system to be more comprehensive and accurate in fault positioning and analysis, and effectively improves the efficiency and reliability of power grid fault processing.
3. The power grid fault online identification system based on big data carries out fault prediction on the power time-frequency characteristics, determines the overall fault occurrence position according to the power time-frequency characteristics and the power grid topological structure, and can remarkably improve the accuracy and response speed of power grid fault detection through multiple applications of the power time-frequency characteristics. The power time-frequency characteristics are extracted through a wavelet transformation technology, and the change condition of power grid operation data at different time and frequency can be carefully reflected. This enables the system to quickly determine the occurrence of a fault based on subtle frequency and time domain characteristics when a grid anomaly is captured. By calculating the threshold value of the power time-frequency characteristic and comparing the actual operation data with the threshold value, the abnormal node can be effectively identified, so that accurate data support is provided for subsequent fault positioning and processing. The application of the power time-frequency characteristics not only enhances the precision of fault detection, but also improves the comprehensive sensing capability of the system on the running state of the power grid, and ensures the safe and stable running of the power grid.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1-2, the present invention provides a technical solution, namely, an online power grid fault identification system based on big data, which includes a data collection module 100, a feature extraction module 200 and a feature analysis module 300.
The data collection module 100 collects real-time power grid operation data, which includes parameters such as current, voltage, frequency and the like, and static structure data of the power grid, such as positions and connection relations of a transformer substation, a generator and a load, by installing various sensors such as a current transformer, a voltage transformer and the like on key nodes and lines of the power grid;
The topology analysis unit 201 in the feature extraction module 200 obtains the topology structure of the power grid according to the real-time power grid operation data by using a graph theory algorithm, and specifically includes:
According to static structure data in real-time power grid operation data, the power grid is represented as a graph, which is marked as G and comprises a node V and a side E, wherein the node V represents equipment in the power grid, such as a transformer substation, a generator, a load and the like, and the side E represents a connecting line between the equipment;
for each edge E in graph G, weights are assigned based on electrical parameters including, but not limited to, resistance of the line, voltage class;
Traversing the whole graph G by using a preferential search algorithm in graph theory, and obtaining a connection matrix C through traversal, wherein the connection relation between nodes is represented by C [ i, j ] =1 if a node i is directly connected with a node j, or C [ i, j ] =0;
Calculating different characteristics of the graph, generating a topological structure of the power grid, wherein the degree distribution D (k) represents the proportion of nodes with specific connection numbers (degrees) in the whole graph, and D (k) =the node number/total node number with the degree of k, for example, D (3) represents the proportion of nodes with three connections in the graph.
The time-frequency analysis unit 202 in the feature extraction module 200 performs wavelet transformation on the real-time power grid operation data by using a wavelet transformation technology, and generates power time-frequency features, which specifically includes:
For each real-time power grid operation data x (t), continuous wavelet transformation is carried out, a wavelet basis function psi (t) is adjusted through a scale parameter a and a translation parameter b, then convolution calculation is carried out on the wavelet basis function psi (t) and the wavelet basis function x (t) to obtain wavelet coefficients W (a, b), and energy characteristics under different scales are extracted from the wavelet coefficients to serve as power time-frequency characteristics, wherein the energy characteristics reflect the change conditions of signals at different times and frequencies.
The prediction unit 301 in the feature analysis module 300 predicts whether a fault occurs according to the power time-frequency feature by using a random forest algorithm, and specifically includes:
Training a random forest model, namely using a historical data set which contains label data of known faults and normal operation, extracting characteristics of a wavelet transformation technology from the historical data set so as to ensure the consistency of the types of input data and training the random forest model, wherein the random forest consists of a plurality of decision trees, randomly extracting subsamples from the training data set and constructing each decision tree in the training process, randomly selecting partial characteristics for splitting nodes in the construction process for each decision tree, and reducing overfitting, wherein the construction process of decision numbers is as follows:
randomly extracting a subset (with a put-back sample) from the historical dataset, the subset being used to construct a single decision tree; selecting a feature subset at random for each node split, rather than using all features, which increases the diversity and robustness of the model, selecting a feature and a split point at each node such that the split point can minimize the pre-split and post-split incompetence, wherein split conditions measuring the incompetence include, but are not limited to, a coefficient of kunity, entropy of information;
the method comprises the steps of performing fault prediction, namely inputting power time-frequency characteristics into a trained random forest model, independently predicting input characteristic vectors by each decision tree in the random forest, outputting a prediction result (such as a fault or a normal state), aggregating the prediction results of all the decision trees, and obtaining a final fault prediction result by adopting a majority voting mechanism.
The decision unit 302 in the feature analysis module 300 will make decisions of different paths according to the failure prediction result, specifically including:
If the prediction result is normal, the system is normally operated;
if the predicted result is a fault, determining the occurrence position of the overall fault according to the power time-frequency characteristic and the power grid topological structure, wherein the method specifically comprises the following steps:
Collecting the power time-frequency characteristic in normal operation from the prediction unit 301, calculating the mean value mu and the standard deviation sigma thereof, and determining the threshold value of the power time-frequency characteristic according to the formula E=mu+ksigma, wherein E is the power time-frequency characteristic threshold value, and k is a multiple factor for reflecting the selection of abnormal sensitivity;
Comparing the power time-frequency characteristic corresponding to all nodes in the power grid topological structure with a power time-frequency characteristic threshold, and taking the node larger than the power time-frequency characteristic threshold as an abnormal node, wherein for each pair of abnormal nodes vm and vn, if C [ m, n ] =1, the abnormal nodes are directly connected;
Calculating the average shortest path length Li from each abnormal node vi to all other abnormal nodes, wherein the fault source candidate vs=argmin (Li), vi epsilon A, wherein A is an abnormal node set, and argmax is used for finding out the independent variable of the minimum value obtained by the function;
And constructing a minimum connected subgraph Gs by taking vs as a center, so that the Gs contains all abnormal nodes, and the node set in the Gs is the integral fault occurrence position.
The decision unit 302 analyzes the importance of each feature to fault judgment according to the feature importance evaluation in the random forest, and selects the feature with the highest importance as the key fault feature, which specifically includes:
Obtaining, for each split, an impure reduction resulting from the split, the impure reduction being considered as the importance of the feature at the node;
For each tree, accumulating the importance of each feature on all nodes, i.e., giving the total importance of each feature in a single tree;
calculating the average importance of each feature in all decision trees, and for each feature, averaging its importance in all N trees, specifically dividing the total importance of each feature in a single tree by the total number of decision trees N;
and sorting all the features according to the average importance, generating an importance sorting list, and selecting the feature with the largest average importance as the key fault feature.
The decision unit 302 determines the critical fault occurrence location according to the critical fault feature and the power grid topology structure, and determines whether the critical fault feature belongs to the overall fault occurrence location, which specifically includes:
Comparing the key fault characteristics with the threshold values of the power time-frequency characteristics, finding key nodes smaller than the threshold values of the power time-frequency characteristics, and comparing the key nodes with nodes in the overall fault occurrence position one by one, wherein:
If the key node is among the nodes in the overall fault occurrence position, indicating that the overall fault occurrence position comprises the key fault occurrence position, determining the key fault occurrence position and the key fault characteristics as a final fault identification result;
if the key node is not among the nodes in the overall fault occurrence position, the key node indicates that the overall fault occurrence position does not contain the key fault occurrence position, the next-highest feature is selected to serve as the key fault feature according to the importance, the next feature of the current key fault feature is selected to serve as the key fault feature according to the importance ranking list, and the determination of the key fault occurrence position and the determination of the containing relation are carried out again until the key fault occurrence position is determined to be within the overall fault occurrence position.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the above-described embodiments, and that the above-described embodiments and descriptions are only preferred embodiments of the present invention, and are not intended to limit the invention, and that various changes and modifications may be made therein without departing from the spirit and scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.