CN119004278B

CN119004278B - Online identification system of power grid faults based on big data

Info

Publication number: CN119004278B
Application number: CN202410969625.2A
Authority: CN
Inventors: 景中炤; 田凤兰; 杨亚男; 靳巍; 陈昕; 刘真; 何天骥; 齐超亮; 李沐峰
Original assignee: Zhengzhou Power Supply Co of Henan Electric Power Co
Current assignee: Zhengzhou Power Supply Co of State Grid Henan Electric Power Co Ltd
Priority date: 2024-07-19
Filing date: 2024-07-19
Publication date: 2025-02-25
Anticipated expiration: 2044-07-19
Also published as: CN119004278A

Abstract

The present invention relates to the technical field of power grid fault identification, and specifically to an online identification system for power grid faults based on big data, comprising a data collection module, a feature extraction module and a feature analysis module, wherein: the data collection module collects real-time power grid operation data; the feature extraction module obtains the topological structure of the power grid according to the real-time power grid operation data; and uses wavelet transform technology to perform wavelet transform on the real-time power grid operation data to generate power time-frequency features; the feature analysis module uses a random forest algorithm to predict whether there is a fault according to the power time-frequency features, wherein: if the prediction result is normal, the system is operated normally; if the prediction result is a fault, the key fault location and key fault features are determined according to the power time-frequency features, the power grid topological structure and the feature importance evaluation of the random forest algorithm.

Description

Power grid fault on-line identification system based on big data

Technical Field

The invention relates to the technical field of power grid fault identification, in particular to an on-line power grid fault identification system based on big data.

Background

Conventional grid fault detection systems have a number of significant problems at the technical level. First, conventional systems typically rely on a single fault signature and fixed threshold determination, which is difficult to handle with multiple fault types in complex grid environments, resulting in insufficient detection accuracy and response speed.

Secondly, the traditional system lacks comprehensive analysis on multidimensional features during fault prediction, and time-frequency information in power grid operation data cannot be fully utilized, so that sensitivity and accuracy of the system during capturing of power grid anomalies are limited.

In addition, the traditional fault positioning method depends on a simple algorithm, so that the complex relation of the power grid topological structure cannot be effectively utilized, and the efficiency and the accuracy of fault positioning are low. The traditional system also faces the fitting problem, and the robustness and flexibility of the system are poor due to the defects of feature selection and model training, so that the system is difficult to adapt to the dynamic change of the running state of the power grid. The technical problems enable the traditional power grid fault detection system to show higher false alarm rate and false alarm rate in practical application, and the stable and safe operation of the power grid cannot be effectively ensured.

In summary, the conventional power grid fault detection system has technical bottlenecks in aspects of feature extraction, fault prediction, fault positioning, model robustness and the like, and the performance of the conventional power grid fault detection system needs to be improved through more advanced methods and algorithms.

Disclosure of Invention

The invention aims to provide an on-line power grid fault identification system based on big data so as to solve the problems in the background technology.

In order to achieve the purpose, the invention provides the technical scheme that the power grid fault online identification system based on big data is characterized by comprising a data collection module, a feature extraction module and a feature analysis module, wherein:

The data collection module collects real-time power grid operation data;

the characteristic extraction module obtains the topological structure of the power grid according to the real-time power grid operation data by using a graph theory algorithm, and performs wavelet transformation on the real-time power grid operation data by using a wavelet transformation technology to generate power time-frequency characteristics;

the feature analysis module predicts whether a fault occurs or not according to the time-frequency features of the power by using a random forest algorithm, wherein:

If the predicted result is a fault, determining the overall fault occurrence position according to the power time-frequency characteristic and the power grid topological structure, analyzing the importance of each characteristic to fault judgment according to the characteristic importance evaluation in the random forest, selecting the characteristic with the highest importance as a key fault characteristic, determining the key fault occurrence position according to the key fault characteristic and the power grid topological structure, and judging the inclusion relation between the overall fault occurrence position and the key fault occurrence position, wherein the method comprises the following steps:

if the overall fault occurrence position comprises a key fault occurrence position, determining the key fault occurrence position and the key fault characteristics as final fault identification results;

if the overall fault occurrence position does not contain the key fault occurrence position, selecting the feature with the next highest importance as the key fault feature according to the importance, and carrying out the determination of the key fault occurrence position and the determination of the inclusion relation again until the key fault occurrence position is determined to be within the overall fault occurrence position;

If the predicted result is normal, the system is normally operated.

As a further improvement of the present technical solution, the data collection module collects real-time grid operation data, including but not limited to current, voltage and frequency, and static structure data of the grid.

As a further improvement of the technical scheme, the feature extraction module includes a topology analysis unit, and the topology analysis unit obtains a topology structure of the power grid according to real-time power grid operation data by using a graph theory algorithm, and specifically includes:

According to static structure data in real-time power grid operation data, the power grid is represented as a graph, which is marked as G and contains a node V and a side E, wherein the node V represents equipment in the power grid, and the side E represents a connecting line between the equipment;

for each edge E in graph G, weights are assigned based on electrical parameters including, but not limited to, resistance of the line, voltage class;

Traversing the whole graph G by using a preferential search algorithm in graph theory, and obtaining a connection matrix C through traversal, wherein the connection relation between nodes is represented by C [ i, j ] =1 if a node i is directly connected with a node j, or C [ i, j ] =0;

calculating different characteristics of the graph, generating a topological structure of the power grid, and enabling the degree distribution D (k) to represent the proportion of nodes with specific connection numbers in the whole graph, wherein D (k) =the number of the nodes with the degree of k/the total number of the nodes.

As a further improvement of the technical scheme, the feature extraction module includes a time-frequency analysis unit, and the time-frequency analysis unit performs wavelet transformation on real-time power grid operation data by using wavelet transformation technology to generate power time-frequency features, and specifically includes:

And (3) for each real-time power grid operation data x (t), carrying out continuous wavelet transformation, adjusting a wavelet basis function psi (t) through a scale parameter a and a translation parameter b, carrying out convolution calculation with x (t), obtaining a wavelet coefficient W (a, b), and extracting energy characteristics under different scales from the wavelet coefficient as power time-frequency characteristics.

As a further improvement of the technical scheme, the feature analysis module comprises a prediction unit, wherein the prediction unit predicts whether a fault occurs according to the time-frequency feature of the electric power by using a random forest algorithm, and specifically comprises the following steps:

Training a random forest model, namely using a historical data set which contains label data of known faults and normal operation, extracting characteristics of a wavelet transformation technology from the historical data set and training the random forest model, wherein the random forest consists of a plurality of decision trees, randomly extracting subsamples from the training data set and constructing each decision tree in the training process, and randomly selecting partial characteristics for splitting nodes in the construction process for each decision tree;

The method comprises the steps of performing fault prediction, namely inputting power time-frequency characteristics into a trained random forest model, independently predicting input characteristic vectors by each decision tree in the random forest, outputting a prediction result, aggregating the prediction results of all the decision trees, and obtaining a final fault prediction result by adopting a majority voting mechanism.

As a further improvement of the technical scheme, the construction process of the decision number is as follows:

A subset is randomly extracted from the historical data set and used for constructing a single decision tree, a feature subset is randomly selected for splitting of each node, and a feature and a splitting point are selected at each node, so that the splitting point can minimize the non-purity before and after splitting, and splitting conditions for measuring the non-purity include, but are not limited to, a coefficient of kunning and information entropy.

As a further improvement of the present technical solution, the feature analysis module includes a decision unit, where the decision unit makes decisions of different paths according to a failure prediction result, and specifically includes:

If the prediction result is normal, the system is normally operated;

If the overall fault occurrence position does not contain the key fault occurrence position, selecting the feature with the next highest importance as the key fault feature according to the importance, and carrying out the determination of the key fault occurrence position and the determination of the inclusion relation again until the key fault occurrence position is determined to be within the overall fault occurrence position.

As a further improvement of the technical scheme, the determining the overall fault occurrence position according to the power time-frequency characteristic and the power grid topological structure specifically includes:

Collecting power time-frequency characteristics in normal operation from a prediction unit, calculating the mean value mu and the standard deviation sigma of the power time-frequency characteristics, and determining a threshold value of the power time-frequency characteristics according to a formula E=mu+k sigma, wherein E is the power time-frequency characteristic threshold value, and k is a multiple factor for reflecting the selection of abnormal sensitivity;

Comparing the power time-frequency characteristic corresponding to all nodes in the power grid topological structure with a power time-frequency characteristic threshold, and taking the node larger than the power time-frequency characteristic threshold as an abnormal node, wherein for each pair of abnormal nodes vm and vn, if C [ m, n ] =1, the abnormal nodes are directly connected;

Calculating the average shortest path length Li from each abnormal node vi to all other abnormal nodes, wherein the fault source candidate vs=argmin (Li), vi epsilon A, wherein A is an abnormal node set, and argmax is used for finding out the independent variable of the minimum value obtained by the function;

And constructing a minimum connected subgraph Gs by taking vs as a center, so that the Gs contains all abnormal nodes, and the node set in the Gs is the integral fault occurrence position.

As a further improvement of the technical scheme, according to the feature importance assessment in the random forest, the importance of each feature to fault judgment is analyzed, and the feature with the highest importance is selected as the key fault feature, which specifically comprises:

Obtaining, for each split, an impure reduction resulting from the split, the impure reduction being considered as the importance of the feature at the node;

For each tree, accumulating the importance of each feature on all nodes to obtain the total importance of each feature in a single tree;

calculating the average importance of each feature in all decision trees, and for each feature, averaging its importance in all N trees, specifically dividing the total importance of each feature in a single tree by the total number of decision trees N;

and sorting all the features according to the average importance, generating an importance sorting list, and selecting the feature with the largest average importance as the key fault feature.

As a further improvement of the technical scheme, the determination process of the occurrence position of the key fault is carried out again as follows:

According to the importance, selecting the feature with the next highest importance as the key fault feature, and particularly selecting the next feature of the current key fault feature as the key fault feature according to the importance sorting list.

Compared with the prior art, the invention has the beneficial effects that:

1. The random forest algorithm in the power grid fault online identification system based on big data not only can be used for carrying out fault prediction, but also can be used for determining key fault characteristics according to characteristic importance evaluation. In the fault prediction process, the random forest model independently predicts the time-frequency characteristics of the electric power by integrating a plurality of decision trees, and obtains a final fault prediction result by adopting a majority voting mechanism, so that the reliability and the accuracy of prediction are improved, furthermore, the random forest model also evaluates the importance of each characteristic and selects the characteristic which is most critical to fault judgment as a key fault characteristic, if the integral fault occurrence position does not contain the key fault occurrence position, the system can select the next highest importance characteristic to carry out fault positioning and judging the inclusion relation again according to the importance of the characteristic until the key fault occurrence position is determined to be within the integral fault occurrence position. The process not only improves the precision of fault identification by multiple application of random forest algorithms, but also enables the system to have higher flexibility and robustness when facing to complex power grid faults.

2. The on-line power grid fault identification system based on big data determines the overall fault occurrence position according to the power time-frequency characteristics and the power grid topological structure, and the key fault occurrence position is determined by the key fault characteristics and the power grid topological structure, so that faults can be more accurately positioned and analyzed through multiple applications of the power grid topological structure, the power grid topological structure utilizes a graph theory algorithm to represent all equipment and connecting lines in the power grid into graphs, and generates a connection matrix and degree distribution through traversal and analysis, the topological characteristics not only help to identify key nodes and connection relations in the power grid, but also provide data support for fault path analysis and shortest path calculation through calculation, and abnormal nodes and normal nodes can be effectively distinguished by combining threshold judgment of the power time-frequency characteristics. On the basis, the key fault feature node is compared with the integral fault occurrence position, the key fault occurrence position can be accurately determined, the inclusion relationship between the key fault feature node and the integral fault occurrence position can be further analyzed, and the method for the multiple application of the power grid topological structure enables the system to be more comprehensive and accurate in fault positioning and analysis, and effectively improves the efficiency and reliability of power grid fault processing.

3. The power grid fault online identification system based on big data carries out fault prediction on the power time-frequency characteristics, determines the overall fault occurrence position according to the power time-frequency characteristics and the power grid topological structure, and can remarkably improve the accuracy and response speed of power grid fault detection through multiple applications of the power time-frequency characteristics. The power time-frequency characteristics are extracted through a wavelet transformation technology, and the change condition of power grid operation data at different time and frequency can be carefully reflected. This enables the system to quickly determine the occurrence of a fault based on subtle frequency and time domain characteristics when a grid anomaly is captured. By calculating the threshold value of the power time-frequency characteristic and comparing the actual operation data with the threshold value, the abnormal node can be effectively identified, so that accurate data support is provided for subsequent fault positioning and processing. The application of the power time-frequency characteristics not only enhances the precision of fault detection, but also improves the comprehensive sensing capability of the system on the running state of the power grid, and ensures the safe and stable running of the power grid.

Drawings

FIG. 1 is a schematic diagram of the overall module of the present invention;

FIG. 2 is a schematic diagram of a feature extraction module unit according to the present invention;

FIG. 3 is a schematic diagram of a feature analysis module unit according to the present invention.

In the figure, 100 parts of a data collection module, 200 parts of a feature extraction module, 201 parts of a topology analysis unit, 202 parts of a time-frequency analysis unit, 300 parts of a feature analysis module, 301 parts of a prediction unit, 302 parts of a decision unit.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1-2, the present invention provides a technical solution, namely, an online power grid fault identification system based on big data, which includes a data collection module 100, a feature extraction module 200 and a feature analysis module 300.

The data collection module 100 collects real-time power grid operation data, which includes parameters such as current, voltage, frequency and the like, and static structure data of the power grid, such as positions and connection relations of a transformer substation, a generator and a load, by installing various sensors such as a current transformer, a voltage transformer and the like on key nodes and lines of the power grid;

The topology analysis unit 201 in the feature extraction module 200 obtains the topology structure of the power grid according to the real-time power grid operation data by using a graph theory algorithm, and specifically includes:

According to static structure data in real-time power grid operation data, the power grid is represented as a graph, which is marked as G and comprises a node V and a side E, wherein the node V represents equipment in the power grid, such as a transformer substation, a generator, a load and the like, and the side E represents a connecting line between the equipment;

Calculating different characteristics of the graph, generating a topological structure of the power grid, wherein the degree distribution D (k) represents the proportion of nodes with specific connection numbers (degrees) in the whole graph, and D (k) =the node number/total node number with the degree of k, for example, D (3) represents the proportion of nodes with three connections in the graph.

The time-frequency analysis unit 202 in the feature extraction module 200 performs wavelet transformation on the real-time power grid operation data by using a wavelet transformation technology, and generates power time-frequency features, which specifically includes:

For each real-time power grid operation data x (t), continuous wavelet transformation is carried out, a wavelet basis function psi (t) is adjusted through a scale parameter a and a translation parameter b, then convolution calculation is carried out on the wavelet basis function psi (t) and the wavelet basis function x (t) to obtain wavelet coefficients W (a, b), and energy characteristics under different scales are extracted from the wavelet coefficients to serve as power time-frequency characteristics, wherein the energy characteristics reflect the change conditions of signals at different times and frequencies.

The prediction unit 301 in the feature analysis module 300 predicts whether a fault occurs according to the power time-frequency feature by using a random forest algorithm, and specifically includes:

Training a random forest model, namely using a historical data set which contains label data of known faults and normal operation, extracting characteristics of a wavelet transformation technology from the historical data set so as to ensure the consistency of the types of input data and training the random forest model, wherein the random forest consists of a plurality of decision trees, randomly extracting subsamples from the training data set and constructing each decision tree in the training process, randomly selecting partial characteristics for splitting nodes in the construction process for each decision tree, and reducing overfitting, wherein the construction process of decision numbers is as follows:

randomly extracting a subset (with a put-back sample) from the historical dataset, the subset being used to construct a single decision tree; selecting a feature subset at random for each node split, rather than using all features, which increases the diversity and robustness of the model, selecting a feature and a split point at each node such that the split point can minimize the pre-split and post-split incompetence, wherein split conditions measuring the incompetence include, but are not limited to, a coefficient of kunity, entropy of information;

the method comprises the steps of performing fault prediction, namely inputting power time-frequency characteristics into a trained random forest model, independently predicting input characteristic vectors by each decision tree in the random forest, outputting a prediction result (such as a fault or a normal state), aggregating the prediction results of all the decision trees, and obtaining a final fault prediction result by adopting a majority voting mechanism.

The decision unit 302 in the feature analysis module 300 will make decisions of different paths according to the failure prediction result, specifically including:

If the prediction result is normal, the system is normally operated;

if the predicted result is a fault, determining the occurrence position of the overall fault according to the power time-frequency characteristic and the power grid topological structure, wherein the method specifically comprises the following steps:

Collecting the power time-frequency characteristic in normal operation from the prediction unit 301, calculating the mean value mu and the standard deviation sigma thereof, and determining the threshold value of the power time-frequency characteristic according to the formula E=mu+ksigma, wherein E is the power time-frequency characteristic threshold value, and k is a multiple factor for reflecting the selection of abnormal sensitivity;

The decision unit 302 analyzes the importance of each feature to fault judgment according to the feature importance evaluation in the random forest, and selects the feature with the highest importance as the key fault feature, which specifically includes:

For each tree, accumulating the importance of each feature on all nodes, i.e., giving the total importance of each feature in a single tree;

The decision unit 302 determines the critical fault occurrence location according to the critical fault feature and the power grid topology structure, and determines whether the critical fault feature belongs to the overall fault occurrence location, which specifically includes:

Comparing the key fault characteristics with the threshold values of the power time-frequency characteristics, finding key nodes smaller than the threshold values of the power time-frequency characteristics, and comparing the key nodes with nodes in the overall fault occurrence position one by one, wherein:

If the key node is among the nodes in the overall fault occurrence position, indicating that the overall fault occurrence position comprises the key fault occurrence position, determining the key fault occurrence position and the key fault characteristics as a final fault identification result;

if the key node is not among the nodes in the overall fault occurrence position, the key node indicates that the overall fault occurrence position does not contain the key fault occurrence position, the next-highest feature is selected to serve as the key fault feature according to the importance, the next feature of the current key fault feature is selected to serve as the key fault feature according to the importance ranking list, and the determination of the key fault occurrence position and the determination of the containing relation are carried out again until the key fault occurrence position is determined to be within the overall fault occurrence position.

The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the above-described embodiments, and that the above-described embodiments and descriptions are only preferred embodiments of the present invention, and are not intended to limit the invention, and that various changes and modifications may be made therein without departing from the spirit and scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. An online identification system for power grid faults based on big data, characterized in that it comprises a data collection module (100), a feature extraction module (200) and a feature analysis module (300), wherein:

The data collection module (100) collects real-time power grid operation data;

The feature extraction module (200) uses a graph theory algorithm to obtain a topological structure of the power grid according to the real-time power grid operation data; and uses a wavelet transformation technology to perform a wavelet transformation on the real-time power grid operation data to generate power time-frequency features;

The feature analysis module (300) uses a random forest algorithm to predict whether a fault occurs based on the power time-frequency features, wherein:

If the prediction result is a fault, the overall fault location is determined based on the power time-frequency characteristics and the grid topology. According to the feature importance evaluation in the random forest, the importance of each feature to fault judgment is analyzed, and the feature with the highest importance is selected as the key fault feature. The key fault location is determined by combining the key fault features with the grid topology, and the overall fault location and the key fault location are determined to contain the relationship, where:

If the overall fault location includes the key fault location, the key fault location and key fault characteristics are determined as the final fault identification result;

If the overall fault location does not include the critical fault location, then select the feature with the second highest importance as the critical fault feature according to the importance, and re-determine the critical fault location and the inclusion relationship until it is determined that the critical fault location is within the overall fault location;

If the prediction result is normal, the system will run normally.

2. The big data-based online identification system for power grid faults according to claim 1 is characterized in that the data collection module (100) collects real-time grid operation data, which includes but is not limited to current, voltage and frequency, as well as static structure data of the grid.

3. The power grid fault online identification system based on big data according to claim 1 is characterized in that the feature extraction module (200) includes a topology analysis unit (201), and the topology analysis unit (201) uses a graph theory algorithm to obtain the topology structure of the power grid according to real-time power grid operation data, specifically including:

According to the static structure data in the real-time power grid operation data, the power grid is represented as a graph, denoted as G, which contains nodes V and edges E, where nodes V represent devices in the power grid and edges E represent the connection lines between devices;

For each edge E in the graph G, a weight is assigned based on electrical parameters, including but not limited to resistance and voltage level of the line;

Use the priority search algorithm in graph theory to traverse the entire graph G, and obtain a connection matrix C through traversal, which represents the connection relationship between nodes: if node i and node j are directly connected, then C[i,j]=1; otherwise, C[i,j]=0;

Different features of the graph are calculated to generate the topological structure of the power grid. The degree distribution D(k) represents the proportion of nodes with a specific number of connections in the entire graph, where D(k) = the number of nodes with degree k/total number of nodes.

4. The online identification system for power grid faults based on big data according to claim 1 is characterized in that the feature extraction module (200) comprises a time-frequency analysis unit (202), and the time-frequency analysis unit (202) uses wavelet transform technology to perform wavelet transform on real-time power grid operation data to generate power time-frequency features, specifically including:

For each real-time power grid operation data x(t), a continuous wavelet transform is performed, and the wavelet basis function ψ(t) is adjusted by the scale parameter a and the translation parameter b. The wavelet coefficient W(a, b) is obtained by convolution calculation with x(t). The energy characteristics at different scales are extracted from the wavelet coefficients as the power time-frequency characteristics.

5. The online identification system for power grid faults based on big data according to claim 1 is characterized in that the feature analysis module (300) comprises a prediction unit (301), and the prediction unit (301) uses a random forest algorithm to predict whether a fault occurs according to the power time-frequency characteristics, specifically comprising:

Training the random forest model: Use a historical data set that contains label data of known faults and normal operation, and perform feature extraction using wavelet transform technology on the historical data set, and use it to train the random forest model; the random forest consists of multiple decision trees. During the training process, subsamples are randomly extracted from the training data set and each decision tree is constructed; for each decision tree, some features are randomly selected during the construction process to split nodes;

Fault prediction: The power time-frequency characteristics are input into the trained random forest model. Each decision tree in the random forest will independently predict the input feature vector and output a prediction result. The prediction results of all decision trees are aggregated and the final fault prediction result is obtained using a majority voting mechanism.

6. The online identification system for power grid faults based on big data according to claim 5 is characterized in that the construction process of the decision number is as follows:

A subset is randomly selected from the historical data set to build a single decision tree. For each node split, a feature subset is randomly selected. At each node, a feature and a split point are selected so that the split point can minimize the impurity before and after the split. The split conditions for measuring impurity include but are not limited to the Gini coefficient and information entropy.

7. The power grid fault online identification system based on big data according to claim 1 is characterized in that the feature analysis module (300) includes a decision unit (302), and the decision unit (302) makes decisions on different paths according to the fault prediction results, specifically including:

If the prediction result is normal, the system will run normally;

If the overall fault location does not include the critical fault location, the feature with the second highest importance is selected as the critical fault feature according to the importance, and the critical fault location is re-determined and the inclusion relationship is determined again until the critical fault location is determined to be within the overall fault location.

8. The online identification system for power grid faults based on big data according to claim 7 is characterized in that the overall fault location is determined according to the power time-frequency characteristics and the grid topology structure, specifically including:

Collecting the power time-frequency characteristics during normal operation from the prediction unit (301), calculating the mean μ and standard deviation σ thereof, and determining the threshold of the power time-frequency characteristics according to the formula E=μ+kσ, where E is the power time-frequency characteristics threshold, and k is a multiple factor used to reflect the selection of abnormal sensitivity;

The power time-frequency characteristics corresponding to all nodes in the power grid topology are compared with the power time-frequency characteristic threshold, and the nodes with a value greater than the power time-frequency characteristic threshold are regarded as abnormal nodes, where: for each pair of abnormal nodes vm and vn, if C[m,n]=1, they are directly connected; according to the edge weights of the graph G, the Dijkstra algorithm is used to find the shortest path between all abnormal nodes;

Calculate the average shortest path length Li from each abnormal node vi to all other abnormal nodes, the fault source candidate vs = argmin (Li), vi∈A; where A is the set of abnormal nodes; argmax is used to find the independent variable that achieves the minimum value of the function;

With vs as the center, construct the minimum connected subgraph Gs so that Gs contains all abnormal nodes. The node set in Gs is the location where the overall fault occurs.

9. The online identification system for power grid faults based on big data according to claim 8 is characterized in that the feature importance evaluation in the random forest is used to analyze the importance of each feature to fault judgment, and the feature with the highest importance is selected as the key fault feature, which specifically includes:

In each decision tree, the split of each node is based on a certain feature; for each split, the reduction in impurity caused by the split is obtained, and the reduction in impurity is regarded as the importance of the feature at this node;

For each tree, the importance of each feature at all nodes is accumulated to obtain the total importance of each feature in a single tree;

Calculate the average importance of each feature in all decision trees. For each feature, average its importance in all N trees. Specifically, divide the total importance of each feature in a single tree by the total number of decision trees N.

All features are sorted according to their average importance, an importance-ranked list is generated, and the feature with the largest average importance is selected as the key fault feature.

10. The online identification system for power grid faults based on big data according to claim 8, characterized in that the process of re-determining the location of the critical fault is as follows:

According to the importance, the feature with the second highest importance is selected as the key fault feature again. Specifically, according to the importance sorting list, the next feature of the current key fault feature is selected as the key fault feature.