CN119787386A

CN119787386A - Deep learning-based micro-grid reactive power optimization method and system

Info

Publication number: CN119787386A
Application number: CN202510266962.XA
Authority: CN
Inventors: 王升星; 杨美贵; 徐应发; 王国龙; 杨洪鑫
Original assignee: Kunming Automation Whole Set Of Equipment Business Group Co ltd
Current assignee: Kunming Automation Whole Set Of Equipment Business Group Co ltd
Priority date: 2025-03-07
Filing date: 2025-03-07
Publication date: 2025-04-08
Anticipated expiration: 2045-03-07
Also published as: CN119787386B

Abstract

The invention provides a deep learning-based micro-grid reactive power optimization method and a deep learning-based micro-grid reactive power optimization system, which take direct training benefits of each sample learning data into consideration by introducing a comparison learning mechanism of active reactive power configuration supervision data and passive reactive power configuration supervision data, and also introduce evaluation of cyclic training strengthening benefits, and effectively avoid model overfitting and promote enhancement of model generalization capability by improving prediction capability of a comparison model among different training groups. In particular, by setting different fusion coefficients of the first cyclic fitting goodness value and the second cyclic fitting goodness value, the invention further optimizes the learning weight of the model on active and passive reactive power configuration, so that the generated target micro-grid reactive power optimization model can more accurately reflect the actual requirements of micro-grid operation, and the accuracy and efficiency of reactive power optimization are improved. Finally, the method can quickly generate efficient reactive power optimization configuration decisions aiming at any given target micro-grid operation data.

Description

Deep learning-based micro-grid reactive power optimization method and system

Technical Field

The invention relates to the technical field of deep learning, in particular to a micro-grid reactive power optimization method and system based on deep learning.

Background

With the rapid development of energy structure transformation and distributed energy technology, micro-grids as small-sized power systems integrating renewable energy sources, energy storage systems and loads have shown great potential in improving energy utilization efficiency, enhancing grid flexibility and reliability. However, reactive power optimization problems of micro-grids have been one of the key factors restricting their efficient operation. The traditional reactive power optimization method is mostly dependent on mathematical optimization algorithms, such as particle swarm optimization, genetic algorithm and the like, and the methods can realize the optimal configuration of reactive power compensation equipment to a certain extent, but have the limitations of high calculation complexity, sensitivity to initial parameters, difficulty in adapting to dynamic changes of a micro-grid and the like.

Disclosure of Invention

In view of the above-mentioned problems, in combination with the first aspect of the present invention, an embodiment of the present invention provides a deep learning-based reactive power optimization method for a micro-grid, the method including:

Acquiring a candidate micro-grid reactive power optimization model and a training data sequence containing a plurality of sample learning data, wherein the sample learning data comprises sample micro-grid operation data, passive reactive power configuration supervision data of the sample micro-grid operation data and active reactive power configuration supervision data of the sample micro-grid operation data;

For each sample learning data in the training data sequence, determining target training strengthening benefits of active reactive configuration supervision data in the sample learning data compared with passive reactive configuration supervision data when model parameter learning is carried out according to the sample learning data in a target training stage;

Determining a first circulation fitting goodness value of active reactive power configuration supervision data generated by an updated micro-grid reactive power optimization model corresponding to the sample learning data and a second circulation fitting goodness value of passive reactive power configuration supervision data generated by an updated micro-grid reactive power optimization model corresponding to the sample learning data, wherein the updated micro-grid reactive power optimization model corresponding to the sample learning data carries out model parameter learning generation according to a previous sample training group of a sample training group corresponding to the sample learning data;

Performing fusion calculation on the first cyclic fitting goodness value and the second cyclic fitting goodness value, determining cyclic training strengthening benefits of the sample learning data, and stopping model parameter learning when determining that the training error of the target training stage does not continuously decrease according to the cyclic training strengthening benefits and the target training strengthening benefits respectively corresponding to each sample learning data, so as to generate a target microgrid reactive power optimization model corresponding to the candidate microgrid reactive power optimization model;

And acquiring target micro-grid operation data to be analyzed, loading the target micro-grid operation data to the target micro-grid reactive power optimization model, and generating reactive power optimization configuration decision data of the target micro-grid operation data.

In a possible implementation manner of the first aspect, the determining that the updated micro-grid reactive power optimization model corresponding to the sample learning data generates the first cyclic goodness-of-fit value of the active reactive power configuration supervision data includes:

acquiring a plurality of candidate active reactive configuration supervision data of the sample learning data;

For each candidate active reactive power configuration supervision data, determining an updated micro-grid reactive power optimization model corresponding to the sample learning data to generate a first cycle fitting goodness value of the candidate active reactive power configuration supervision data;

and carrying out average value calculation on the first cyclic fitting goodness value respectively corresponding to each candidate active reactive power configuration supervision data, generating an updated micro-grid reactive power optimization model corresponding to the sample learning data, and generating the first cyclic fitting goodness value of the active reactive power configuration supervision data.

In a possible implementation manner of the first aspect, the method further includes:

Determining the positive supervision training reinforcement benefit of the updated micro-grid reactive power optimization model corresponding to the sample learning data compared with the candidate micro-grid reactive power optimization model and the negative supervision training reinforcement benefit of the updated micro-grid reactive power optimization model corresponding to the sample learning data compared with the candidate micro-grid reactive power optimization model;

Determining model training comparison loss corresponding to the sample learning data according to the positive supervision training strengthening benefit and the negative supervision training strengthening benefit;

And when determining that the training error in the target training stage does not continuously decrease any more according to the cycle training strengthening benefit and the target training strengthening benefit respectively corresponding to each sample learning data, terminating model parameter learning, and generating a target micro-grid reactive power optimization model corresponding to the candidate micro-grid reactive power optimization model, wherein the method comprises the following steps:

And stopping model parameter learning when determining that the training error in the target training stage does not continuously decrease according to the cycle training strengthening benefit, the target training strengthening benefit and the model training comparison loss respectively corresponding to each sample learning data, and generating a target micro-grid reactive power optimization model corresponding to the candidate micro-grid reactive power optimization model.

In a possible implementation manner of the first aspect, determining a negative supervision training reinforcement benefit of the updated microgrid reactive power optimization model corresponding to the sample learning data as compared to the candidate microgrid reactive power optimization model includes:

Determining the training strengthening benefit of passive reactive configuration supervision data corresponding to each sample training packet corresponding to the sample learning data and the updated micro-grid reactive power optimization model under each sample learning data compared with the candidate micro-grid reactive power optimization model;

and carrying out average value calculation on the training strengthening benefits of the passive reactive power configuration supervision data, and determining the passive supervision training strengthening benefits of the updated micro-grid reactive power optimization model corresponding to the sample learning data compared with the candidate micro-grid reactive power optimization model.

For each sample learning data, determining a reference training error corresponding to the sample learning data according to the cyclic training strengthening benefit and the target training strengthening benefit of the sample learning data;

determining a first fusion coefficient of a reference training error under the sample learning data and a second fusion coefficient of model training comparison loss under the sample learning data;

According to the first fusion coefficient and the second fusion coefficient, carrying out fusion calculation on a reference training error of the sample learning data and a model training comparison loss, and determining a training error corresponding to the sample learning data;

and when the calculation result of the training error corresponding to each sample learning data is not continued to be reduced, determining that the training error of the target training stage is not continued to be reduced.

In a possible implementation manner of the first aspect, the determining a first fusion coefficient of the reference training error under the sample learning data includes:

Determining the updated training reinforcement benefit of the active reactive configuration supervision data after model parameter learning is completed compared with the passive reactive configuration supervision data under the sample learning data;

Determining a first fusion coefficient of a reference training error under the sample learning data according to a first difference value between the updated training strengthening benefit and the target training strengthening benefit;

wherein determining a first fusion coefficient of the reference training error under the sample learning data according to the first difference value between the updated training reinforcement benefit and the target training reinforcement benefit comprises:

According to updated training strengthening benefits and target training strengthening benefits respectively corresponding to the sample learning data in the sample training groups corresponding to the sample learning data, determining first difference values respectively corresponding to the sample learning data;

average value calculation is carried out on the first difference values corresponding to the sample learning data respectively, and a first threshold value matched with the sample learning data is determined;

and when the first difference value corresponding to the sample learning data is not greater than the first threshold value, determining a first fusion coefficient of the reference training error under the sample learning data according to the negative association relation.

In a possible implementation manner of the first aspect, determining the second fusion coefficient of the model training comparison loss under the sample learning data includes:

determining the training strengthening benefit of the passive reactive configuration supervision data of the updated micro-grid reactive power optimization model corresponding to the sample learning data compared with the candidate micro-grid reactive power optimization model;

Determining a second fusion coefficient of model training comparison loss under the sample learning data according to a second difference value of the active supervision training reinforcement benefit and the passive reactive configuration supervision data training reinforcement benefit, wherein the second fusion coefficient and the second difference value are in a negative association relationship;

wherein determining a second fusion coefficient of model training comparison loss under the sample learning data according to the second difference value of the active supervision training reinforcement benefit and the passive reactive configuration supervision data training reinforcement benefit comprises:

determining second difference values corresponding to the sample learning data according to the positive supervision training strengthening benefit and the negative reactive configuration supervision data training strengthening benefit corresponding to the sample learning data in the sample training group corresponding to the sample learning data;

performing average value calculation on second difference values corresponding to the sample learning data respectively, and determining a second threshold value matched with the sample learning data;

and when the second difference value corresponding to the sample learning data is not greater than the second threshold value, determining a second fusion coefficient of model training comparison loss under the sample learning data according to a negative association relation.

In a possible implementation manner of the first aspect, the process of acquiring a training data sequence including a plurality of sample learning data includes:

acquiring reactive power optimization requirements of a micro-grid and operation data of the micro-grid with a plurality of samples;

for each sample micro-grid operation data, determining requirement index data related to reactive power optimization requirements of the micro-grid, wherein the requirement index data comprises the sample micro-grid operation data;

Loading the demand index data into a pre-trained deep learning network, and taking the generation of the deep learning network as passive reactive power configuration supervision data of the sample micro-grid operation data;

Generating active reactive power configuration supervision data of the sample micro-grid operation data according to expert optimization instructions for generating the passive reactive power configuration supervision data of the sample micro-grid operation data;

And configuring sample learning data comprising the sample micro-grid operation data, active reactive power configuration supervision data of the sample micro-grid operation data and passive reactive power configuration supervision data of the sample micro-grid operation data, and generating a training data sequence comprising a plurality of sample learning data.

In a possible implementation manner of the first aspect, the process of obtaining the candidate micro-grid reactive power optimization model includes:

Acquiring an initialized neural network model;

Each basic sample learning data comprises sample micro-grid operation data, and active reactive configuration supervision data of the sample micro-grid operation data or passive reactive configuration supervision data of the sample micro-grid operation data;

And carrying out iterative updating on the neural network model according to the basic training data sequence to generate a candidate micro-grid reactive power optimization model.

In yet another aspect, an embodiment of the present invention further provides a deep learning-based microgrid reactive power optimization system, which includes a processor, and a machine-readable storage medium, where the machine-readable storage medium is connected to the processor, and the machine-readable storage medium is used to store a program, an instruction, or a code, and the processor is used to execute the program, the instruction, or the code in the machine-readable storage medium, so as to implement the method described above.

Based on the above aspects, the embodiment of the application remarkably improves the performance of the model in the reactive power optimization task of the micro-grid by introducing the contrast learning mechanism of the active reactive power configuration supervision data and the passive reactive power configuration supervision data. In the training process, the method not only considers the direct training benefit of the learning data of each sample, but also introduces the evaluation of the circulating training strengthening benefit, and effectively avoids the overfitting of the model and promotes the strengthening of the generalization capability of the model by comparing the prediction capability improvement of the model among different training groups. In particular, by setting different fusion coefficients of the first cyclic fitting goodness value and the second cyclic fitting goodness value, the application further optimizes the learning weight of the model on active and passive reactive power configuration, so that the generated target micro-grid reactive power optimization model can more accurately reflect the actual requirements of micro-grid operation, and the accuracy and efficiency of reactive power optimization are improved. Finally, the method can quickly generate efficient reactive power optimization configuration decisions aiming at any given target micro-grid operation data, thereby being beneficial to stable operation and energy efficiency improvement of the micro-grid.

Drawings

Fig. 1 is a schematic diagram of an execution flow of a deep learning-based reactive power optimization method for a micro grid according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a hardware architecture of a deep learning-based micro-grid reactive power optimization system according to an embodiment of the present invention.

Detailed Description

The invention is specifically described below with reference to the accompanying drawings, and fig. 1 is a schematic flow chart of a deep learning-based reactive power optimization method for a micro-grid according to an embodiment of the invention, and the deep learning-based reactive power optimization method for the micro-grid is described in detail below.

Step S110, a candidate micro-grid reactive power optimization model and a training data sequence containing a plurality of sample learning data are obtained. The sample learning data comprises sample microgrid operation data, passive reactive configuration supervision data of the sample microgrid operation data, and active reactive configuration supervision data of the sample microgrid operation data.

In this embodiment, in one micro grid system, it is assumed that one large industrial park micro grid exists. This microgrid contains a plurality of distributed power sources (e.g. solar photovoltaic panels, small wind generators), different types of loads (industrial consumers, office consumers, etc.), and associated power transmission and conversion equipment (transformers, switchgears, etc.).

And for obtaining candidate micro-grid reactive power optimization models, selecting an initialized neural network model from the existing neural network model library. This neural network model may be a multi-layer perceptron (MLP) structure having an input layer, a number of hidden layers, and an output layer. The number of nodes of the input layer may be determined according to the characteristic number of the operation data of the micro grid, for example, the characteristics of voltage, current, power factor, active power and reactive power demand of the load of the micro grid are all taken as the input of the input layer.

Next a training data sequence is acquired. Sample microgrid operational data for a plurality of different time periods is collected. For example, at 9 to 10 am on a certain day, at this time, the solar photovoltaic power generation panel generates power under medium illumination intensity, the industrial electric equipment is in a normal production state, the electric equipment in the office area is partially opened, and the corresponding operation data of the micro-grid comprise the real-time voltage of 380V, the current of 50A, the power factor of 0.85, the output power of each distributed power supply, the power requirement of the load and the like of the whole micro-grid.

For the determination of passive reactive configuration supervision data, these sample microgrid operation data are loaded as demand index data into a pre-trained deep learning network. This deep learning network may be previously trained on reactive configuration problems of the microgrid. Taking this 9-10 am example as an example, the deep learning network outputs a reactive power configuration scheme according to the input micro-grid operation data, and according to rules and algorithms learned in advance, for example, the reactive power compensation amount set for a certain reactive power compensation device is Q1. This reactive configuration scheme is used as passive reactive configuration supervision data.

Active reactive configuration supervision data is then generated. And according to expert optimization instructions generated for the passive reactive configuration supervision data. And the expert optimizes the supervision data of the reactive power configuration of the consumable electrode according to own experience and deeper understanding of the micro-grid system, and considering the aims of reducing the power grid loss, improving the power quality and the like. For example, in the current operation state, the expert considers that the reactive compensation amount can be adjusted to Q2 by adjusting the parameters of a certain reactive compensation device (Q2 is different from Q1 and meets the optimization objective better), and the reactive configuration scheme corresponding to Q2 is taken as the active reactive configuration supervision data.

Thus, for each sample microgrid operational data, corresponding passive reactive configuration supervision data and active reactive configuration supervision data are generated, thereby configuring sample learning data. A plurality of such sample learning data forms a training data sequence.

Step S120, for each sample learning data in the training data sequence, determining a target training reinforcement benefit of the active reactive configuration supervision data in the sample learning data compared with the passive reactive configuration supervision data when model parameter learning is performed according to the sample learning data in the target training stage.

Continuing with the industrial park microgrid example described above. In the target training phase, each sample learning data in the training data sequence is used for model parameter learning. Data is learned for a sample, such as the previously mentioned 9 am to 10 am sample.

The sample microgrid operation data is firstly input into a model which is currently being trained (a candidate microgrid reactive power optimization model in the initial stage of training or an updated model after a certain iteration). The model predicts the results of the active reactive configuration and the passive reactive configuration respectively according to the input operation data. The assumed model predicts the reactive compensation amount Q1 'according to the input feature corresponding to the passive reactive configuration supervision data, and predicts the reactive compensation amount Q2' according to the input feature corresponding to the active reactive configuration supervision data.

The determination of the target training enhancement benefit may be considered in several respects. For example, the accuracy of reactive compensation, contribution to the overall performance improvement of the micro-grid, and the like can be measured. If from the point of view of reactive compensation accuracy, a measure, such as the error rate, can be defined. Assuming that the error rate according to the passive reactive configuration supervision data is e1= |q1-Q1'|/Q1, the error rate according to the active reactive configuration supervision data is e2= |q2-Q2' |/Q2. The target training enhancement benefit of the active reactive configuration supervision data in terms of reactive compensation accuracy compared to the passive reactive configuration supervision data may be expressed as Δe=e1-E2. If delta E is greater than 0, the active reactive configuration supervision data has better effect on reactive compensation accuracy than the passive reactive configuration supervision data under the sample learning data, and delta E is the embodiment of the target training strengthening benefit.

From the perspective of improving the overall performance of the micro-grid, the influence of reactive compensation on the aspects of voltage stability, power grid loss and the like can be considered. Let us assume that in the passive reactive configuration the voltage fluctuation range of the microgrid is Δv1 and in the active reactive configuration the voltage fluctuation range is Δv2. If DeltaV 1 is greater than DeltaV 2, this indicates that the reactive configuration has a better improvement in voltage stability. The reactive power compensation method can integrate benefits in the aspects of reactive power compensation accuracy, micro-grid overall performance improvement and the like according to a certain weight, and target training strengthening benefits of the active reactive power configuration supervision data compared with the passive reactive power configuration supervision data under the sample learning data are obtained.

Step S130, determining a first circulation fitting goodness value of active reactive power configuration supervision data generated by an updated micro-grid reactive power optimization model corresponding to the sample learning data and a second circulation fitting goodness value of passive reactive power configuration supervision data generated by an updated micro-grid reactive power optimization model corresponding to the sample learning data. And updating the reactive power optimization model of the micro-grid corresponding to the sample learning data, and performing model parameter learning generation according to a previous sample training packet of the sample training packet corresponding to the sample learning data. And in the process of learning the first model parameters, the deep learning target of the first sample training group is the candidate micro-grid reactive power optimization model.

Still taking industrial park microgrids as an example. It is assumed that the training data sequence is divided into a plurality of sample training packets. For example training packets in which certain example learning data is located, such as an nth example training packet.

A first loop fitness value is first determined. A plurality of candidate active reactive configuration supervision data for this sample learning data is obtained. These candidate reactive configuration supervision data may be generated by making some minor perturbation to the reactive configuration supervision data or based on different algorithms. For example, for the previously mentioned proactive reactive configuration supervision data Q2 in the 9 am to 10 sample example, Q21, Q22, Q23 wait to choose proactive reactive configuration supervision data can be obtained by fine tuning the parameters of the reactive compensation device.

And for each candidate active reactive power configuration supervision data, inputting sample micro-grid operation data into an updated micro-grid reactive power optimization model generated by model parameter learning according to an n-1 sample training group (when the first sample training group is used, the deep learning target is the candidate micro-grid reactive power optimization model). Assuming that the predicted reactive compensation amount output by the model is Q21' for the candidate active reactive configuration supervision data Q21, the first loop goodness-of-fit value of the updated microgrid reactive optimization model generation Q21 corresponding to this sample learning data may be calculated from the difference between the predicted value and the actual value. For example, a Mean Square Error (MSE) equation, mse= (Q21-Q21') 2, may be used. Likewise, for Q22, Q23, awaiting selection of active reactive configuration supervision data, a corresponding first loop goodness-of-fit value is also calculated. And finally, carrying out average value calculation on the first cycle fitting goodness value to obtain a first cycle fitting goodness value of active reactive configuration supervision data generated by an updated micro-grid reactive power optimization model corresponding to the sample learning data.

The process is similar for the determination of the second loop goodness-of-fit value. Similar operations are also performed for passive reactive configuration supervision data. Assuming the passive reactive configuration supervision data is Q1, a plurality of candidate passive reactive configuration supervision data Q11, Q12, Q13, etc. (generated by similar perturbations or different algorithms) may be obtained. And (3) inputting the sample micro-grid operation data into an updated micro-grid reactive power optimization model, calculating a fitting goodness value (using MSE and other methods) corresponding to each candidate passive reactive power configuration supervision data, and finally carrying out mean value calculation to obtain a second circulation fitting goodness value of the passive reactive power configuration supervision data generated by the updated micro-grid reactive power optimization model corresponding to the sample learning data.

And step S140, carrying out fusion calculation on the first cyclic fitting goodness value and the second cyclic fitting goodness value, determining the cyclic training strengthening benefit of the sample learning data, and terminating model parameter learning when determining that the training error of the target training stage does not continuously decrease any more according to the cyclic training strengthening benefit and the target training strengthening benefit respectively corresponding to each sample learning data, so as to generate a target micro-grid reactive power optimization model corresponding to the candidate micro-grid reactive power optimization model. And the fusion coefficient of the first cyclic goodness-of-fit value is smaller than that of the second cyclic goodness-of-fit value.

In the context of industrial park microgrids, the previous example continues. Assuming that the first loop-fitting goodness value is F1 and the second loop-fitting goodness value is F2 have been calculated for a certain sample of learning data. Since the fusion coefficient of the first cyclic goodness-of-fit value is smaller than the fusion coefficient of the second cyclic goodness-of-fit value, assuming that the fusion coefficient of the first cyclic goodness-of-fit value is α, the fusion coefficient of the second cyclic goodness-of-fit value is β (α < β and α+β=1).

The cyclic training reinforcement benefit of this sample learning data can be calculated by the formula cyclic training reinforcement benefit = α x f1+β x F2.

Such calculations are performed for each sample of the training data sequence throughout the training process. And determining training errors according to the circulating training strengthening benefit and the target training strengthening benefit which correspond to the learning data of each sample.

For example, for each sample learning data, the training error may be defined as calculated by training error= |cyclic training reinforcement benefit-target training reinforcement benefit|. Along with the progress of model parameter learning, training errors corresponding to the learning data of each sample are continuously calculated. When the calculation results of the training errors corresponding to all the sample learning data are found not to be reduced any more, the model is explained to be converged to a better state, and the model parameter learning is terminated. And finally, taking the model at the moment as a target micro-grid reactive power optimization model corresponding to the candidate micro-grid reactive power optimization model. The target micro-grid reactive power optimization model has higher accuracy and reliability in the aspect of the micro-grid reactive power optimization configuration decision.

Step S150, obtaining target micro-grid operation data to be analyzed, loading the target micro-grid operation data into the target micro-grid reactive power optimization model, and generating reactive power optimization configuration decision data of the target micro-grid operation data.

It is assumed that in the industrial park microgrid, new target microgrid operation data to be analyzed of 3 pm to 4 pm are acquired. In this case, the solar photovoltaic power generation panel may change the illumination intensity due to the shielding of the cloud layer, and the load of the industrial electric device is changed, for example, the voltage is 375V, the current is 45A, the power factor is 0.83, and the output power of each distributed power supply and the power requirement of the load change correspondingly.

And loading the target micro-grid operation data into a target micro-grid reactive power optimization model obtained before. And the target micro-grid reactive power optimization model analyzes and processes the input target micro-grid operation data according to the parameters and algorithms which are learned in the target micro-grid reactive power optimization model. The model can comprehensively consider various factors such as voltage stability, power grid loss, characteristics of a reactive compensation device and the like according to the current running state of the micro-grid, and output reactive optimal configuration decision data of the target micro-grid running data. For example, the model may output that the reactive compensation amount of a reactive compensation device should be adjusted to Q3, and control parameters of other related devices are set accordingly, so as to implement reactive power optimization configuration of the micro-grid in the period of time, and improve operation efficiency and power quality of the micro-grid.

Based on the steps, the performance of the model in the reactive power optimization task of the micro-grid is remarkably improved by introducing a comparison learning mechanism of the active reactive power configuration supervision data and the passive reactive power configuration supervision data. In the training process, the method not only considers the direct training benefit of the learning data of each sample, but also introduces the evaluation of the circulating training strengthening benefit, and effectively avoids the overfitting of the model and promotes the strengthening of the generalization capability of the model by comparing the prediction capability improvement of the model among different training groups. In particular, by setting different fusion coefficients of the first cyclic fitting goodness value and the second cyclic fitting goodness value, the application further optimizes the learning weight of the model on active and passive reactive power configuration, so that the generated target micro-grid reactive power optimization model can more accurately reflect the actual requirements of micro-grid operation, and the accuracy and efficiency of reactive power optimization are improved. Finally, the method can quickly generate efficient reactive power optimization configuration decisions aiming at any given target micro-grid operation data, thereby being beneficial to stable operation and energy efficiency improvement of the micro-grid.

In one possible implementation, step S130 includes:

Step S131, acquiring a plurality of candidate active reactive configuration supervision data of the sample learning data.

And step S132, for each candidate active reactive power configuration supervision data, determining an updated micro-grid reactive power optimization model corresponding to the sample learning data to generate a first loop fitting goodness value of the candidate active reactive power configuration supervision data.

And step S133, carrying out mean value calculation on the first cycle fitting goodness value corresponding to each candidate active reactive power configuration supervision data respectively, and generating an updated micro-grid reactive power optimization model corresponding to the sample learning data to generate the first cycle fitting goodness value of the active reactive power configuration supervision data.

In this embodiment, in the industrial park microgrid scenario, consider the case of 9 to 10 am in the sample learning data. Generating a first cycle fitting goodness value of active reactive configuration supervision data for determining an updated micro-grid reactive power optimization model corresponding to the sample learning data, wherein the first operation is to acquire a plurality of candidate active reactive configuration supervision data of the sample learning data. During this particular microgrid operation period, the raw proactive reactive configuration supervision data is the optimized configuration data for the reactive compensation devices generated based on expert optimization instructions. In order to obtain a plurality of candidate active reactive configuration supervision data, various ways may be used. For example, the original active reactive configuration supervision data is adjusted according to different reactive compensation strategy adjustment algorithms. Assuming that the compensation amount of the reactive power compensation device in the original active reactive power configuration supervision data is set to be Q2, the reactive power compensation device is adjusted to be Q21 through one adjustment algorithm, then is adjusted to be Q22 through another algorithm, and a plurality of different candidate active reactive power configuration supervision data are obtained through the same.

And for each candidate active reactive power configuration supervision data, determining an updated micro-grid reactive power optimization model corresponding to the sample learning data to generate a first cycle fitting goodness value of the candidate active reactive power configuration supervision data. And (3) inputting sample micro-grid operation data (including data such as voltage, current, power factor, distributed power supply output power, load power requirement and the like) from 9 to 10 am into an updated micro-grid reactive power optimization model generated by model parameter learning according to a previous sample training packet of the sample training packet corresponding to the sample learning data. Taking the candidate active reactive configuration supervision data Q21 as an example, when the sample micro-grid operation data is input into the updated micro-grid reactive power optimization model, the model outputs a predicted reactive power compensation quantity Q21' according to the parameters and algorithm structures in the model. At this time, the first cycle goodness-of-fit value may be determined from the difference between the predicted value Q21' and the actual candidate active reactive configuration supervision data Q21. One common way of determining is to use a Mean Square Error (MSE) calculation method, i.e. calculating the value of (Q21-Q21') 2 as the first cyclic goodness-of-fit value corresponding to Q21. According to the same manner, for candidate active reactive power configuration supervision data Q22 and the like, sample micro-grid operation data are input into an updated micro-grid reactive power optimization model to obtain corresponding predicted values, and mean square errors are calculated through the actual candidate active reactive power configuration supervision data, so that corresponding first cycle fitting goodness values are determined.

And finally, carrying out average value calculation on the first cyclic fitting goodness value corresponding to each candidate active reactive power configuration supervision data respectively, generating an updated micro-grid reactive power optimization model corresponding to the sample learning data, and generating the first cyclic fitting goodness value of the active reactive power configuration supervision data. Assuming that the first cyclic goodness-of-fit value calculated for Q21 is F21, the first cyclic goodness-of-fit value calculated for Q22 is F22, and so on, summing all the first cyclic goodness-of-fit values F21, F22, and the like, and dividing by the number of candidate active reactive configuration supervision data to obtain a result, namely, the first cyclic goodness-of-fit value of the active reactive configuration supervision data generated by the updated micro-grid reactive optimization model corresponding to the sample learning data. The first cyclic fitting goodness value can reflect the fitting effect of the updated micro-grid reactive power optimization model on the active reactive power configuration supervision data, has important significance in the subsequent model evaluation and optimization process, and is beneficial to accurately measuring the performance of the model under different reactive power configuration conditions, so that a more reliable decision basis is provided for the micro-grid reactive power optimization.

In one possible embodiment, the method further comprises:

and step A110, determining the positive supervision training reinforcement benefit of the updated micro-grid reactive power optimization model corresponding to the sample learning data compared with the candidate micro-grid reactive power optimization model and the negative supervision training reinforcement benefit of the updated micro-grid reactive power optimization model corresponding to the sample learning data compared with the candidate micro-grid reactive power optimization model.

And step A120, determining model training comparison loss corresponding to the sample learning data according to the positive supervision training reinforcement benefit and the negative supervision training reinforcement benefit.

Step S140 includes terminating model parameter learning when it is determined that the training error in the target training stage does not continue to decrease according to the cycle training reinforcement benefit, the target training reinforcement benefit and the model training comparison loss respectively corresponding to each sample learning data, and generating a target micro-grid reactive power optimization model corresponding to the candidate micro-grid reactive power optimization model.

In one possible implementation, step a110 includes:

And step A111, determining the passive reactive configuration supervision data training strengthening benefit corresponding to each candidate micro-grid reactive power optimization model in the sample training packet corresponding to the sample learning data and updating the micro-grid reactive power optimization model under each sample learning data.

And step A112, performing average value calculation on the training strengthening benefits of the passive reactive power configuration supervision data, and determining the passive supervision training strengthening benefits of the updated micro-grid reactive power optimization model corresponding to the sample learning data compared with the candidate micro-grid reactive power optimization model.

In this embodiment, first, the training enhancement benefit is considered to be actively supervised. For the case of 9 to 10 am in the sample learning data, in the micro grid operation in this period, the sample learning data includes sample micro grid operation data, passive reactive configuration supervision data of the sample micro grid operation data, and active reactive configuration supervision data. When it comes to updating the reactive power optimization model of the micro-grid in comparison with the active supervision training reinforcement benefit of the candidate micro-grid reactive power optimization model, the benefits need to be measured from a plurality of technical indexes. For example, improvement of reactive compensation accuracy, improvement of voltage stability, reduction of power grid loss, and the like are considered.

In the aspect of reactive power compensation accuracy, the sample micro-grid operation data from 9 am to 10 am are respectively input into an updated micro-grid reactive power optimization model and a candidate micro-grid reactive power optimization model. Under the active reactive power configuration supervision data, the predicted value of the reactive power compensation quantity output by the candidate micro-grid reactive power optimization model is assumed to be Q2', and the predicted value of the reactive power compensation quantity output by the updated micro-grid reactive power optimization model is assumed to be Q2' '. And measuring the active supervision training strengthening benefit by calculating the error difference value of the reactive compensation quantity Q2 in the actual active reactive configuration supervision data and the actual active reactive configuration supervision data. For example, the formula may be used to actively supervise training enhancement benefit (reactive compensation accuracy) = |q2-Q2' | -Q2-q2″|. If the value is greater than 0, the updated micro-grid reactive power optimization model has enhanced benefits in terms of reactive power compensation accuracy compared with the candidate micro-grid reactive power optimization model.

And for the improvement degree of the voltage stability, calculating the change of the voltage fluctuation range of the micro-grid under two models according to the reactive power configuration result output by the models. Assuming that the voltage fluctuation range under the candidate micro-grid reactive power optimization model is deltav 1, updating the voltage fluctuation range under the micro-grid reactive power optimization model to deltav 2, and actively supervising the training reinforcement benefit (voltage stability) =deltav1-deltav 2. If the value is larger than 0, the updated reactive power optimization model of the micro-grid has better performance in terms of voltage stability.

And for the power grid loss reduction effect, calculating power grid loss values under the two models according to the relation model of reactive power configuration and power grid loss. And (3) setting the power grid loss under the candidate micro-grid reactive power optimization model as L1, updating the power grid loss under the micro-grid reactive power optimization model as L2, and actively supervising the training strengthening benefit (power grid loss) =L1-L2. If the power consumption is larger than 0, the updated reactive power optimization model of the micro-grid has positive strengthening benefit in the aspect of reducing the power consumption. And combining the benefits in the aspects, and obtaining the active supervision training strengthening benefit of the updated micro-grid reactive power optimization model corresponding to the sample learning data compared with the candidate micro-grid reactive power optimization model through certain weight distribution (such as weight determination according to the actual requirements of micro-grid operation).

And similarly, for the passive supervision training reinforcement benefit, determining passive reactive configuration supervision data training reinforcement benefit corresponding to the candidate micro-grid reactive power optimization model in the sample training packet corresponding to the sample learning data and updating the micro-grid reactive power optimization model under each sample learning data. Taking a plurality of sample learning data in a sample training group as an example, for each sample learning data, the sample micro-grid operation data is input into an updated micro-grid reactive power optimization model and a candidate micro-grid reactive power optimization model, wherein the sample learning data is 9 to 10 am. Under the passive reactive power configuration supervision data, the predicted value of the reactive power compensation quantity output by the candidate micro-grid reactive power optimization model is assumed to be Q1', and the predicted value of the reactive power compensation quantity output by the updated micro-grid reactive power optimization model is assumed to be Q1' '. The passive reactive configuration supervision data training enhancement benefit is calculated, for example in terms of reactive compensation accuracy, the passive reactive configuration supervision data training enhancement benefit (reactive compensation accuracy) = |q1-Q1'| -Q1' |.

The voltage stability and grid loss aspects are also calculated in a similar manner. For example, in terms of voltage stability, the voltage fluctuation range under the candidate micro-grid reactive power optimization model is set to be Δv3, the voltage fluctuation range under the updated micro-grid reactive power optimization model is set to be Δv4, and the passive reactive power configuration supervision data trains the strengthening benefit (voltage stability) =Δv3- Δv4. In terms of power grid loss, the power grid loss under the candidate micro-grid reactive power optimization model is set to be L3, the power grid loss under the micro-grid reactive power optimization model is updated to be L4, and the passive reactive power configuration supervision data trains the strengthening benefit (power grid loss) =L3-L4. And then carrying out average value calculation on the training strengthening benefits of the passive reactive configuration supervision data under the learning data of each sample in the sample training group. Assuming that n samples of learning data are used, the training reinforcement benefits of the passive reactive configuration supervision data under each sample of learning data are respectively B1, B2, and Bn, and then the passive supervision training reinforcement benefits of the updated micro-grid reactive power optimization model corresponding to the sample of learning data are = (b1+b2+.+ Bn)/n compared with the passive supervision training reinforcement benefits of the candidate micro-grid reactive power optimization model.

And determining model training comparison loss corresponding to the sample learning data according to the obtained positive supervision training strengthening benefit and the obtained negative supervision training strengthening benefit. For example, model training comparison loss, i.e., model training comparison loss = active supervisor training enhancement benefit-passive supervisor training enhancement benefit, may be defined in the form of a difference. The model training comparison loss reflects the benefit difference of the updated micro-grid reactive power optimization model in the aspects of active and passive supervision training, and is beneficial to more comprehensively evaluating the training effect of the model.

And stopping model parameter learning when determining that the training error in the target training stage does not continuously decline according to the cycle training strengthening benefit, the target training strengthening benefit and the model training comparison loss which are respectively corresponding to the learning data of each sample, and generating a target micro-grid reactive power optimization model corresponding to the candidate micro-grid reactive power optimization model. For each sample of learning data, the cyclic training reinforcement benefit is previously obtained by specific calculations, and the target training reinforcement benefit is also determined based on a comparison of the active and passive reactive configuration supervision data, plus a model training comparison penalty. Taking learning data of a certain sample as an example, assuming that the cycle training strengthening benefit is C, the target training strengthening benefit is T, and the model training comparison loss is L, one way of determining the training error can be that the training error is = |C-T|+|L|. Such training errors for each sample learning data are continuously calculated throughout the training process. Along with the progress of model parameter learning, when the training errors corresponding to all sample learning data do not continuously decline any more, the model is explained to reach a relatively stable state, and model parameter learning is stopped at the moment. The final model is a target micro-grid reactive power optimization model corresponding to the candidate micro-grid reactive power optimization model, and the target model has higher accuracy and reliability in the reactive power optimization configuration decision of the micro-grid in the industrial park, can better adapt to the running requirement of the micro-grid, and improves the running efficiency and stability of the micro-grid.

In one possible embodiment, the method further comprises:

And step B110, for each sample learning data, determining a reference training error corresponding to the sample learning data according to the cyclic training reinforcement benefit and the target training reinforcement benefit of the sample learning data.

And step B120, determining a first fusion coefficient of a reference training error under the sample learning data and a second fusion coefficient of a model training comparison loss under the sample learning data.

And step B130, performing fusion calculation on the reference training error of the sample learning data and the model training comparison loss according to the first fusion coefficient and the second fusion coefficient, and determining the training error corresponding to the sample learning data.

And step B140, when the calculation result of the training error corresponding to each sample learning data is not reduced continuously, determining that the training error of the target training stage is not reduced continuously.

In one possible embodiment, step B120 includes:

Step B121, determining the updated training reinforcement benefit of the active reactive configuration supervision data after model parameter learning is completed compared with the passive reactive configuration supervision data under the sample learning data.

And step B122, determining a first fusion coefficient of the reference training error under the sample learning data according to the first difference value of the updated training reinforcement benefit and the target training reinforcement benefit. The first fusion coefficient and the first difference value are in a negative association relation.

Wherein step B122 includes:

and step B1221, determining a first difference value corresponding to each sample learning data according to the updated training strengthening benefit and the target training strengthening benefit corresponding to each sample learning data in the sample training group corresponding to the sample learning data.

And step B122, carrying out average value calculation on the first difference values respectively corresponding to the sample learning data, and determining a first threshold value matched with the sample learning data.

And step B1223, when the first difference value corresponding to the sample learning data is not greater than the first threshold value, determining a first fusion coefficient of the reference training error under the sample learning data according to the negative association relation.

In this embodiment, the micro grid operation condition about 9 to 10 am in the sample learning data is taken as an example. The cycle training strengthening benefit is obtained through a series of previous calculations and reflects the performance of the model in the cycle training process, and the target training strengthening benefit is determined according to the performance of the active reactive configuration supervision data in the target training stage compared with the passive reactive configuration supervision data. The manner in which the reference training error is determined may be based on a difference relationship between the two. Assuming that the cyclic training reinforcement benefit is CT and the target training reinforcement benefit is TT, the reference training error can be calculated by the formula reference training error= |CT-TT|. The reference training error can reflect the deviation degree of the sample learning data in the training process from one angle and is an important basis for subsequent calculation.

Next, a first fusion coefficient of the reference training error under the sample learning data and a second fusion coefficient of the model training comparison loss under the sample learning data are determined. Firstly, the updated training reinforcement benefit of the active reactive configuration supervision data after model parameter learning is completed compared with the passive reactive configuration supervision data under the sample learning data is determined. And respectively inputting sample micro-grid operation data into the updated model after model parameter learning is completed for sample learning data from 9 am to 10 am, and obtaining corresponding output results aiming at the active reactive power configuration supervision data and the passive reactive power configuration supervision data. Assuming that the output result under the active reactive configuration supervision data is Q2', the output result under the passive reactive configuration supervision data is Q1', and the actual active reactive configuration supervision data is Q2, the passive reactive configuration supervision data is Q1. The updated training strengthening benefit can be calculated by comprehensively considering the reactive compensation accuracy, the influence on the voltage stability, the power grid loss and the like. For example, in terms of reactive compensation accuracy, the training enhancement benefits (reactive compensation accuracy) = |Q1-Q1'| - |Q2-Q2' | are updated, in terms of voltage stability, corresponding benefit differences are calculated according to the influence of reactive configuration output by a model on the voltage fluctuation range, in terms of power grid loss, benefit differences are calculated according to the relation between reactive configuration and power grid loss, and then the aspects are combined (through certain weight distribution, weight is determined according to the running requirement of a micro-grid) to obtain the total updated training enhancement benefits.

And determining a first fusion coefficient of the reference training error under the sample learning data according to the first difference value of the updated training strengthening benefit and the target training strengthening benefit. Assuming that the updated training enhancement benefit is UT, the target training enhancement benefit is TT, and the first difference is d1=ut-TT. The first fusion coefficient is in a negative correlation with the first difference value, which means that the first fusion coefficient decreases when the first difference value increases. In order to determine the first fusion coefficient, according to updated training strengthening benefit and target training strengthening benefit respectively corresponding to each sample learning data in the sample training group corresponding to the sample learning data, determining a first difference value respectively corresponding to each sample learning data. For example, there are n pieces of sample learning data in the sample training packet, and the first differences D11, D12, and D1n are calculated for each sample learning data in the above-described manner. And carrying out average value calculation on the first difference values respectively corresponding to the sample learning data, and determining a first threshold value matched with the sample learning data. I.e. first threshold = (D11) +d12+ & gt +d1n)/n. And when the first difference value corresponding to the sample learning data is not greater than a first threshold value, determining a first fusion coefficient of the reference training error under the sample learning data according to the negative association relation. For example, a negative correlation function may be set, such as a first fusion coefficient= -k×d1 (k is a constant determined empirically or experimentally), and when D1 is not greater than a first threshold value, the first fusion coefficient is determined according to this formula.

And determining a second fusion coefficient of model training comparison loss under the sample learning data. The model training comparison loss is calculated from the positive and negative supervised training enhancement benefits. For sample learning data from 9 to 10 am, the positive supervision training strengthening benefit and the negative supervision training strengthening benefit are obtained by inputting the sample micro-grid operation data into an updated micro-grid reactive power optimization model and a candidate micro-grid reactive power optimization model before, and comprehensively calculating the reactive power compensation accuracy, the voltage stability, the power grid loss and the like. Assuming that the positive supervision training enhancement benefit is PT and the negative supervision training enhancement benefit is NT, model training comparison loss = PT-NT. And determining a second fusion coefficient of model training comparison loss under the sample learning data according to a second difference value of the active supervision training reinforcement benefit and the passive reactive configuration supervision data training reinforcement benefit. Assuming that the positive supervision training enhancement benefit is PT, the negative reactive configuration supervision data training enhancement benefit is NT ', and the second difference is d2=pt-NT'. And determining second differences D21, D22, D2n corresponding to the sample learning data respectively according to the positive supervision training strengthening benefit and the negative reactive configuration supervision data training strengthening benefit corresponding to the sample learning data in the sample training group corresponding to the sample learning data in a similar manner to determine the first fusion coefficient, and performing average calculation on the second differences to obtain a second threshold value matched with the sample learning data, wherein when the second difference corresponding to the sample learning data is not greater than the second threshold value, the second fusion coefficient is determined according to the negative association relationship, for example, the second fusion coefficient= -m x D2 (m is a constant determined according to experience or experiments) can be set.

And according to the first fusion coefficient and the second fusion coefficient, carrying out fusion calculation on the reference training error of the sample learning data and the model training comparison loss, and determining the training error corresponding to the sample learning data. Assuming that the reference training error is RE, the model training comparison loss is ML, the first fusion coefficient is α, and the second fusion coefficient is β, then the training error corresponding to the sample learning data=α×re+β×ml. The training error comprehensively considers the reference training error, the model training comparison loss and the respective fusion coefficients thereof, and can more comprehensively reflect the state of the sample learning data in the training process.

And when the calculation results of the training errors corresponding to the learning data of each sample are not reduced continuously, determining that the training errors of the target training stage are not reduced continuously. The training error is continuously calculated for each sample of the learning data throughout the training process. For example, there are a plurality of sample learning data in the sample training packet, and the training error of each sample learning data varies as model parameter learning proceeds. When the training errors of the learning data of all the samples do not continuously decline any more, the whole model is indicated to reach a relatively stable state in the target training stage, namely the training errors of the target training stage do not continuously decline any more. The judging basis can effectively determine the termination time of model training, and ensure that the obtained model has better performance in the reactive power optimization aspect of the industrial park micro-grid.

In one possible implementation, step B122 further includes:

And step B1224, determining that the updated micro-grid reactive power optimization model corresponding to the sample learning data is compared with the passive reactive power configuration supervision data training reinforcement benefit of the candidate micro-grid reactive power optimization model.

And step B1225, determining a second fusion coefficient of model training comparison loss under the sample learning data according to a second difference value of the active supervision training reinforcement benefit and the passive reactive configuration supervision data training reinforcement benefit. And the second fusion coefficient and the second difference value are in a negative association relation.

Wherein step B1225 comprises:

And step B1225-1, determining second differences corresponding to the sample learning data according to the positive supervision training reinforcement benefit and the negative reactive configuration supervision data training reinforcement benefit corresponding to the sample learning data in the sample training group corresponding to the sample learning data.

And step B1225-2, performing average value calculation on the second difference values corresponding to the sample learning data respectively, and determining a second threshold value matched with the sample learning data.

And step B1225-3, determining a second fusion coefficient of model training comparison loss under the sample learning data according to the negative association relation when the second difference value corresponding to the sample learning data is not greater than the second threshold value.

In this embodiment, taking a micro-grid operation period from 9 a.m. to 10 a.m. in the sample learning data as an example, the sample micro-grid operation data includes various parameters such as voltage, current, power factor, output power of the distributed power source, and power demand of the load during this period. And respectively inputting the sample micro-grid operation data into an updated micro-grid reactive power optimization model and a candidate micro-grid reactive power optimization model.

And for passive reactive power configuration supervision data, obtaining corresponding output results after inputting two models. And assuming that the reactive compensation quantity predicted value output by the candidate micro-grid reactive power optimization model aiming at the passive reactive power configuration supervision data is Q1', and updating the reactive compensation quantity predicted value output by the micro-grid reactive power optimization model to be Q1' '. From the aspect of reactive power compensation accuracy, the training enhancement benefit (reactive power compensation accuracy) of the passive reactive power configuration supervision data can be obtained by calculating |q1-q1' | -Q1-q1|, wherein Q1 is the actual reactive power compensation amount in the passive reactive power configuration supervision data. Meanwhile, the passive reactive configuration supervision data training enhancement benefit is determined from two aspects of voltage stability and power grid loss, which are critical to the operation of the micro-grid. For voltage stability, corresponding benefit differences are calculated according to the influence of reactive power configuration output by the two models on the voltage fluctuation range of the micro-grid. For example, the voltage fluctuation range of the micro-grid under the candidate micro-grid reactive power optimization model is Δv3, and the voltage fluctuation range under the updated micro-grid reactive power optimization model is Δv4, so that the training strengthening benefit (voltage stability) of the passive reactive power configuration supervision data in terms of voltage stability is Δv3- Δv4. And for the power grid loss, setting the power grid loss under the candidate micro-grid reactive power optimization model as L3 according to a relation model of reactive power configuration and power grid loss, updating the power grid loss under the micro-grid reactive power optimization model as L4, and training and strengthening benefits (power grid loss) of passive reactive power configuration supervision data as L3-L4. And combining the aspects (carrying out weighted summation according to predetermined weights, wherein the weights are set according to the running characteristics and the optimization targets of the micro-grid), and obtaining the training strengthening benefit of the updated micro-grid reactive power optimization model corresponding to the sample learning data compared with the passive reactive power configuration supervision data of the candidate micro-grid reactive power optimization model.

And then, determining a second fusion coefficient of model training comparison loss under the sample learning data according to a second difference value of the active supervision training enhancement benefit and the passive reactive configuration supervision data training enhancement benefit. The positive supervision training strengthening benefit is calculated by inputting the sample micro-grid operation data into the updated micro-grid reactive power optimization model and the candidate micro-grid reactive power optimization model and comparing the output result under the positive reactive power configuration supervision data from the aspects of reactive power compensation accuracy, voltage stability, grid loss and the like. Assuming that the positive supervision training enhancement benefit is PT, the negative reactive configuration supervision data training enhancement benefit is NT, and the second difference is d2=pt-NT. The fact that the second fusion coefficient and the second difference value are in a negative association relation means that the larger the second difference value is, the smaller the second fusion coefficient is.

In order to determine the second fusion coefficient, according to the positive supervision training reinforcement benefit and the negative reactive configuration supervision data training reinforcement benefit which are respectively corresponding to the sample learning data in the sample training group corresponding to the sample learning data, determining a second difference value which is respectively corresponding to the sample learning data. For example, there are n sample learning data in the sample training group, for each sample learning data, the active supervisor training reinforcement benefit and the passive reactive configuration supervisor data training reinforcement benefit are calculated in the above manner, and the corresponding second differences D21, D22, and D2n are derived.

And then, carrying out average value calculation on the second difference values respectively corresponding to the sample learning data, and determining a second threshold value matched with the sample learning data. I.e. calculate the second threshold= (d21+d22+) +d2n)/n. And when the second difference value corresponding to the sample learning data is not greater than a second threshold value, determining a second fusion coefficient of model training comparison loss under the sample learning data according to the negative association relation. For example, a functional relationship may be set to represent the negative association relationship, and assuming that the second fusion coefficient is β, β= -m×d2 (m is a constant determined empirically or experimentally), when the second difference D2 corresponding to the sample learning data is not greater than the second threshold value, the second fusion coefficient of the model training comparison loss under the sample learning data is determined according to the formula. The method for determining the second fusion coefficient comprehensively considers the situation of the learning data of each sample in the sample training group, so that the determination of the second fusion coefficient is more reasonable and accurate, the training state of the model can be reflected more comprehensively when the training error corresponding to the sample learning data is calculated subsequently, the accuracy and the reliability of the reactive power optimization model of the micro-grid can be improved, and the operation requirement of the micro-grid of the industrial park can be met better.

In one possible implementation, step S110 includes:

and S111, acquiring reactive power optimization requirements of the micro-grid and running data of the micro-grid with a plurality of samples.

Step S112, for each sample micro-grid operation data, determining demand index data associated with the micro-grid reactive power optimization demand. The demand index data comprises the sample micro-grid operation data.

And step S113, loading the demand index data into a pre-trained deep learning network, and taking the generation of the deep learning network as passive reactive power configuration supervision data of the sample micro-grid operation data.

Step S114, generating active reactive power configuration supervision data of the sample micro-grid operation data according to expert optimization instructions for generating the passive reactive power configuration supervision data of the sample micro-grid operation data.

Step S115, configuring sample learning data including the sample micro-grid operation data, active reactive configuration supervision data of the sample micro-grid operation data, and passive reactive configuration supervision data of the sample micro-grid operation data, and generating a training data sequence including a plurality of sample learning data.

In this embodiment, the microgrid reactive power optimization requirements are determined based on the overall operational objectives of the industrial park microgrid, including, but not limited to, maintaining voltage stability, reducing grid losses, improving power quality, etc. For example, in order to meet the requirements of various industrial devices in an industrial park on the power quality, the voltage fluctuation range needs to be controlled within a certain value, which is a specific reactive power optimization requirement. At the same time, a plurality of sample microgrid operational data are collected, which reflect the actual operational state of the microgrid over different time periods. Taking different time periods in a day as, for example, 9 to 10 am, at the moment, industrial production is in a normal production state, most of equipment runs at full load, the illumination condition of a photovoltaic power generation panel is medium intensity, the corresponding sample micro-grid operation data comprise the voltage value of a micro-grid, the current of the micro-grid is 50A, the power factor of the micro-grid is 0.85, the output power of each distributed power supply (such as a solar photovoltaic power generation panel, a small wind driven generator and the like), the power requirements of different types of loads (industrial electric equipment, office area electric equipment and the like) and the like.

For each sample of microgrid operational data, demand index data associated with the microgrid reactive power optimization demand is determined. The demand index data includes the sample microgrid operational data itself and other derivative data related to reactive power optimization demand. For example, in addition to the above-mentioned direct operation data of voltage, current, power factor, etc., it is possible to include data of apparent power, reactive power, etc., calculated from the voltage and current. These demand index data can more fully reflect the relationship between the operating state of the micro-grid and the reactive power optimization demand. Taking sample micro-grid operation data from 9 am to 10 am as an example, the calculated apparent power, reactive power and other data are closely related to preset reactive power optimization requirements such as voltage stability, grid loss and the like, and the requirement index data are formed together.

And loading the demand index data into a pre-trained deep learning network, and taking the generation of the deep learning network as passive reactive power configuration supervision data of sample micro-grid operation data. This pre-trained deep learning network is trained based on a large number of microgrid operational data and associated reactive configuration cases. And after the demand index data corresponding to the sample micro-grid operation data from 9 am to 10 am is loaded to the deep learning network, the network outputs a reactive power configuration scheme as passive reactive power configuration supervision data according to an algorithm and a model structure in the deep learning network. For example, the deep learning network may determine, according to the input data, that the reactive compensation amount to be set on a certain reactive compensation device is Q1, where the reactive configuration scheme corresponding to Q1 is passive reactive configuration supervision data for the micro-grid operation data of the sample. The passive reactive power configuration supervision data is a preliminary reactive power configuration suggestion obtained based on the deep learning network learning the existing data and rules.

And generating active reactive power configuration supervision data of the sample micro-grid operation data according to expert optimization instructions for generating the passive reactive power configuration supervision data of the sample micro-grid operation data. And the expert optimizes the supervision data of the reactive power configuration of the consumable electrode according to the abundant experience of the expert and the deeper understanding of the micro-grid system of the industrial park. The expert may take into account more practical operating complications such as the sensitivity of a particular industrial installation to reactive power, the output characteristics of the distributed power supply under different seasons and weather conditions, etc. Taking the reactive compensation Q1 in the passive reactive configuration supervision data as an example, in the current operation state, the expert may consider that, considering the upcoming peak period of electricity consumption and the special requirement of a certain key industrial equipment on voltage stability, it is more suitable to adjust the reactive compensation Q2, and the reactive configuration scheme corresponding to Q2 is used as the active reactive configuration supervision data. The expert optimization instruction is an adjustment which is more favorable for meeting the reactive power optimization requirement of the micro-grid and is made by comprehensively considering various actual factors on the basis of passive reactive power configuration supervision data.

And finally, configuring sample learning data comprising sample micro-grid operation data, active reactive power configuration supervision data of the sample micro-grid operation data and passive reactive power configuration supervision data of the sample micro-grid operation data, and generating a training data sequence comprising a plurality of sample learning data. For the samples from 9 to 10 am, the sample micro-grid operation data, the corresponding active reactive power configuration supervision data Q2 and the corresponding passive reactive power configuration supervision data Q1 in the period are combined together to form sample learning data. And processing the operation data of the sample micro-grid at other different time intervals in the same way to obtain a plurality of sample learning data. The sample learning data together form a training data sequence which is used for training a subsequent micro-grid reactive power optimization model, and rich learning samples are provided for the model, so that the model can be better adapted to various running conditions of an industrial park micro-grid, and the accuracy and the effectiveness of reactive power optimization are improved.

In one possible implementation, the process of obtaining the candidate micro-grid reactive power optimization model includes:

an initialized neural network model is obtained.

And configuring a basic training data sequence containing a plurality of basic sample learning data according to the training data sequence. Each of the base sample learning data includes sample microgrid operational data and active reactive configuration supervision data of the sample microgrid operational data or passive reactive configuration supervision data of the sample microgrid operational data.

In this embodiment, the initialized neural network model is the basis for constructing the reactive power optimization model of the micro-grid. For example, a neural network model of a multi-layer perceptron (MLP) architecture is selected having an input layer, a number of hidden layers, and an output layer. The number of nodes of the input layer is determined according to the characteristics of the sample micro-grid operation data, and characteristics such as voltage, current, power factor, distributed power supply output power, load power requirement and the like in the industrial park micro-grid can be used as the input of the input layer. The number of neurons and the layer number of the hidden layers are set according to the complexity requirement of the model, for example, 3 hidden layers can be set, and each layer has different numbers of neurons, so as to realize complex nonlinear mapping of input data. The output of the output layer then corresponds to the relevant result of the reactive configuration, such as the reactive compensation quantity, etc. The parameters such as initial weights and offsets of the initialized neural network model may be randomly set, which is merely a model with basic structure but not yet trained for reactive power optimization of the microgrid.

And configuring a basic training data sequence containing a plurality of basic sample learning data according to the training data sequence. Each base sample learning data includes sample microgrid operational data, and active reactive configuration supervision data for the sample microgrid operational data or passive reactive configuration supervision data for the sample microgrid operational data. And extracting and arranging from the training data sequence acquired before. Taking a sample of 9 to 10 am of the industrial park micro-grid as an example, the micro-grid operation data of the sample comprises information such as 380V voltage, 50A current, 0.85 power factor and the like. The basic sample learning data may only include this sample micro grid operation data and corresponding passive reactive configuration supervision data, for example, the reactive compensation amount of the reactive compensation device in the passive reactive configuration supervision data is Q1. Or the sample micro-grid operation data and active reactive power configuration supervision data, for example, the reactive power compensation amount in the active reactive power configuration supervision data is Q2. In this manner, multiple samples in the training data sequence are processed to form multiple base sample learning data that together form the base training data sequence.

And carrying out iterative updating on the neural network model according to the basic training data sequence to generate a candidate micro-grid reactive power optimization model. And inputting the basic sample learning data in the basic training data sequence into the initialized neural network model one by one. And for each basic sample learning data, the neural network model carries out forward propagation calculation according to the input sample micro-grid operation data to obtain a preliminary reactive power configuration output result. Then, a loss function is calculated from the difference between the output result and the corresponding active reactive configuration supervision data or passive reactive configuration supervision data. For example, using the Mean Square Error (MSE) as the loss function, if the sample micro grid operation data and the passive reactive power configuration supervision data are input, and the reactive power compensation amount in the passive reactive power configuration supervision data is Q1, the predicted value of the reactive power compensation amount output by the model is Q1', the value of the loss function is (Q1-Q1') 2. And then, through a back propagation algorithm, parameters such as weight, bias and the like of the neural network model are adjusted according to the loss function so as to reduce the value of the loss function. This process is repeated continuously, and as more basic sample learning data is input into the model and iteratively updated, the neural network model gradually learns the relationship pattern between the sample microgrid operational data and the reactive configuration. After multiple iterations, when the value of the loss function reaches a stable and acceptable range, the neural network model becomes a candidate micro-grid reactive power optimization model. The candidate micro-grid reactive power optimization model has a certain capacity to perform preliminary optimization decision on reactive power configuration of the micro-grid, and lays a foundation for subsequent further training and optimization.

Fig. 2 illustrates a hardware structural intent of the deep learning-based microgrid reactive power optimization system 100 for implementing the deep learning-based microgrid reactive power optimization method according to an embodiment of the present invention, as shown in fig. 2, the deep learning-based microgrid reactive power optimization system 100 may include a processor 110, a machine-readable storage medium 120, a bus 130, and a communication unit 140.

The machine-readable storage medium 120 may store data and/or instructions. In some embodiments, the machine-readable storage medium 120 may store data acquired from an external terminal. In some embodiments, the machine-readable storage medium 120 may store data and/or instructions that the deep learning based microgrid reactive power optimization system 100 uses to perform or use to complete the exemplary methods described in this disclosure.

In a specific implementation, the one or more processors 110 execute computer executable instructions stored by the machine-readable storage medium 120, so that the processor 110 may execute the deep learning-based reactive power optimization method for a micro grid according to the above method embodiment, where the processor 110, the machine-readable storage medium 120, and the communication unit 140 are connected through the bus 130, and the processor 110 may be used to control the transceiving actions of the communication unit 140.

The specific implementation process of the processor 110 may refer to the above embodiments of the method executed by the deep learning-based micro-grid reactive power optimization system 100, and the implementation principle and technical effect are similar, which is not repeated here.

In addition, the embodiment of the invention also provides a readable storage medium, wherein computer executable instructions are preset in the readable storage medium, and when a processor executes the computer executable instructions, the micro-grid reactive power optimization method based on deep learning is realized.

It should be noted that in order to simplify the presentation of the disclosure and thereby aid in understanding one or more embodiments of the invention, various features are sometimes grouped together in a single embodiment, figure, or description thereof.

Claims

1. A microgrid reactive power optimization method based on deep learning, characterized in that the method comprises:

Acquire a candidate microgrid reactive power optimization model and a training data sequence comprising a plurality of sample learning data; the sample learning data comprises sample microgrid operation data, passive reactive power configuration supervision data of the sample microgrid operation data, and active reactive power configuration supervision data of the sample microgrid operation data;

For each sample learning data in the training data sequence, when model parameter learning is performed according to the sample learning data in a target training phase, determining a target training reinforcement benefit of the positive reactive configuration supervision data in the sample learning data compared to the negative reactive configuration supervision data;

Determine the first cycle goodness of fit value of the updated microgrid reactive optimization model corresponding to the sample learning data to generate active reactive configuration supervision data, and the second cycle goodness of fit value of the updated microgrid reactive optimization model corresponding to the sample learning data to generate passive reactive configuration supervision data; the updated microgrid reactive optimization model corresponding to the sample learning data is generated by model parameter learning based on the previous sample training group of the sample training group corresponding to the sample learning data; in the process of the first model parameter learning, the deep learning target of the first sample training group is the candidate microgrid reactive optimization model;

The first cycle fit goodness of fit value and the second cycle fit goodness of fit value are fused and calculated to determine the cycle training reinforcement benefit of the sample learning data, and when it is determined that the training error of the target training stage no longer continues to decrease based on the cycle training reinforcement benefit and the target training reinforcement benefit corresponding to each of the sample learning data, the model parameter learning is terminated to generate the target microgrid reactive optimization model corresponding to the candidate microgrid reactive optimization model; the fusion coefficient of the first cycle fit goodness of fit value is less than the fusion coefficient of the second cycle fit goodness of fit value;

The target microgrid operation data to be analyzed is obtained, the target microgrid operation data is loaded into the target microgrid reactive power optimization model, and reactive power optimization configuration decision data of the target microgrid operation data is generated.

2. The method for reactive power optimization of a microgrid based on deep learning according to claim 1, characterized in that the step of determining the first cycle goodness of fit value of the updated microgrid reactive power optimization model corresponding to the sample learning data to generate active reactive power configuration supervision data comprises:

Acquire multiple candidate active reactive configuration supervision data of the sample learning data;

For each of the candidate active reactive configuration supervision data, determining a first cycle goodness of fit value of the updated microgrid reactive optimization model corresponding to the sample learning data to generate the candidate active reactive configuration supervision data;

The first cycle goodness of fit values corresponding to each of the candidate active reactive configuration supervision data are averaged to generate the first cycle goodness of fit value of the active reactive configuration supervision data generated by the updated microgrid reactive optimization model corresponding to the sample learning data.

3. The microgrid reactive power optimization method based on deep learning according to claim 1, characterized in that the method further comprises:

Determine the positive supervision training reinforcement benefit of the updated microgrid reactive optimization model corresponding to the sample learning data compared to the candidate microgrid reactive optimization model, and the negative supervision training reinforcement benefit of the updated microgrid reactive optimization model corresponding to the sample learning data compared to the candidate microgrid reactive optimization model;

Determining the model training comparison loss corresponding to the sample learning data according to the positive supervision training reinforcement benefit and the negative supervision training reinforcement benefit;

When it is determined that the training error of the target training stage no longer continues to decrease according to the cyclic training reinforcement benefit and the target training reinforcement benefit respectively corresponding to each of the sample learning data, the model parameter learning is terminated, and the target microgrid reactive power optimization model corresponding to the candidate microgrid reactive power optimization model is generated, including:

When it is determined that the training error of the target training stage no longer continues to decrease based on the cyclic training enhancement benefit, the target training enhancement benefit and the model training comparison loss corresponding to each of the sample learning data, the model parameter learning is terminated and the target microgrid reactive optimization model corresponding to the candidate microgrid reactive optimization model is generated.

4. The method for reactive power optimization of a microgrid based on deep learning according to claim 3, characterized in that determining the negative supervision training reinforcement benefit of the updated microgrid reactive power optimization model corresponding to the sample learning data compared with the candidate microgrid reactive power optimization model comprises:

Determine the training enhancement benefits of the passive reactive configuration supervision data corresponding to each of the candidate microgrid reactive optimization models in the sample training group corresponding to the sample learning data and under each of the sample learning data;

The mean value of each of the passive reactive configuration supervision data training enhancement benefits is calculated to determine the passive supervision training enhancement benefit of the updated microgrid reactive optimization model corresponding to the sample learning data compared with the candidate microgrid reactive optimization model.

5. The microgrid reactive power optimization method based on deep learning according to claim 3, characterized in that the method further comprises:

For each of the sample learning data, determining a reference training error corresponding to the sample learning data according to the cyclic training reinforcement benefit and the target training reinforcement benefit of the sample learning data;

Determine a first fusion coefficient of a reference training error under the sample learning data and a second fusion coefficient of a model training comparison loss under the sample learning data;

According to the first fusion coefficient and the second fusion coefficient, a reference training error of the sample learning data and a model training comparison loss are fused and calculated to determine a training error corresponding to the sample learning data;

When the calculation results of the training errors corresponding to the sample learning data no longer continue to decrease, it is determined that the training error of the target training stage no longer continues to decrease.

6. The method for reactive power optimization of a microgrid based on deep learning according to claim 5, characterized in that the step of determining a first fusion coefficient of a reference training error under the sample learning data comprises:

Determine the update training reinforcement benefit of the active reactive configuration supervision data under the sample learning data compared with the passive reactive configuration supervision data after completing the model parameter learning;

Determining a first fusion coefficient of the reference training error under the sample learning data according to a first difference between the updated training reinforcement benefit and the target training reinforcement benefit; the first fusion coefficient and the first difference are in a negative correlation relationship;

The step of determining a first fusion coefficient of a reference training error under the sample learning data according to a first difference between the updated training enhancement benefit and the target training enhancement benefit includes:

Determine first differences corresponding to each sample learning data according to the updated training reinforcement benefits and the target training reinforcement benefits corresponding to each sample learning data in the sample training group corresponding to the sample learning data;

Calculate the mean of the first difference values respectively corresponding to the sample learning data to determine a first threshold value matching the sample learning data;

When the first difference corresponding to the sample learning data is not greater than the first threshold value, a first fusion coefficient of the reference training error under the sample learning data is determined according to the negative correlation relationship.

7. The method for reactive power optimization of a microgrid based on deep learning according to claim 5, characterized in that determining a second fusion coefficient of the model training comparison loss under the sample learning data comprises:

Determine the training enhancement benefit of the passive reactive configuration supervision data of the updated microgrid reactive optimization model corresponding to the sample learning data compared to the candidate microgrid reactive optimization model;

Determine a second fusion coefficient of the model training comparison loss under the sample learning data according to a second difference between the active supervision training enhancement benefit and the passive reactive configuration supervision data training enhancement benefit; the second fusion coefficient and the second difference are negatively correlated;

Wherein, determining the second fusion coefficient of the model training comparison loss under the sample learning data according to the second difference between the active supervision training enhancement benefit and the passive reactive configuration supervision data training enhancement benefit includes:

Determine the second difference value corresponding to each sample learning data according to the positive supervision training reinforcement benefit and the negative reactive configuration supervision data training reinforcement benefit corresponding to each sample learning data in the sample training group corresponding to the sample learning data;

Calculating the mean of the second difference values respectively corresponding to the sample learning data to determine a second threshold value matching the sample learning data;

When the second difference corresponding to the sample learning data is not greater than the second threshold value, a second fusion coefficient of the model training comparison loss under the sample learning data is determined according to the negative correlation relationship.

8. The method for reactive power optimization of a microgrid based on deep learning according to any one of claims 1 to 7, characterized in that the process of obtaining a training data sequence containing a plurality of sample learning data comprises:

Obtain microgrid reactive power optimization requirements and multiple sample microgrid operation data;

For each of the sample microgrid operation data, determining demand index data associated with the microgrid reactive power optimization demand; the demand index data includes the sample microgrid operation data;

Loading the demand index data into a pre-trained deep learning network, and using the generation of the deep learning network as passive reactive configuration supervision data of the sample microgrid operation data;

Generate active reactive configuration supervision data for the sample microgrid operation data according to the expert optimization instruction generated for the passive reactive configuration supervision data for the sample microgrid operation data;

The sample learning data including the sample microgrid operation data, the active reactive configuration supervision data of the sample microgrid operation data and the passive reactive configuration supervision data of the sample microgrid operation data are configured to generate a training data sequence including a plurality of sample learning data.

9. The method for reactive power optimization of a microgrid based on deep learning according to any one of claims 1 to 7, characterized in that the process of obtaining a candidate microgrid reactive power optimization model comprises:

Get the initialized neural network model;

According to the training data sequence, a basic training data sequence including a plurality of basic sample learning data is configured; each of the basic sample learning data includes sample microgrid operation data, and active reactive configuration supervision data of the sample microgrid operation data or passive reactive configuration supervision data of the sample microgrid operation data;

The neural network model is iteratively updated according to the basic training data sequence to generate a candidate microgrid reactive power optimization model.

10. A microgrid reactive power optimization system based on deep learning, characterized in that the microgrid reactive power optimization system based on deep learning comprises a processor and a memory, the memory is connected to the processor, the memory is used to store programs, instructions or codes, and the processor is used to execute the programs, instructions or codes in the memory to implement the microgrid reactive power optimization method based on deep learning as described in any one of claims 1 to 9.