CN119382159A

CN119382159A - Intelligent decision-making method and system for distribution network based on knowledge embedding and multi-agent system

Info

Publication number: CN119382159A
Application number: CN202411507652.4A
Authority: CN
Inventors: 陈赟; 王佳裕; 潘智俊; 赵文恺; 罗潇; 林震宇; 汤蕾; 傅超然; 洪祎祺; 王晓慧; 贺兴; 马墅研
Original assignee: State Grid Shanghai Electric Power Co Ltd
Current assignee: State Grid Shanghai Electric Power Co Ltd
Priority date: 2024-10-28
Filing date: 2024-10-28
Publication date: 2025-01-28

Abstract

The invention relates to a power distribution network intelligent decision-making method and system based on knowledge embedding and a multi-agent system, wherein the method comprises the following steps of collecting operation data of each distributed energy device in a power distribution network in real time and preprocessing; and based on the preprocessed operation data and the distributed multi-layer energy control architecture based on the multi-agent system, performing intelligent energy scheduling decision by adopting a method combining expert knowledge embedding and reinforcement learning to finish a decision process. Compared with the prior art, the method and the device can realize efficient distributed energy scheduling in complex and changeable energy scenes, and improve the intelligence, flexibility and robustness of power distribution network scheduling.

Description

Intelligent decision-making method and system for power distribution network based on knowledge embedding and multi-agent system

Technical Field

The invention relates to the technical field of intelligent power grids, in particular to a power distribution network intelligent decision method and system based on knowledge embedding and a multi-agent system.

Background

With the increasing demand for energy and the rapid development of distributed energy resources (such as photovoltaic power generation, wind energy, energy storage systems, electric vehicles, etc.), the complexity of modern power systems has been difficult to deal with by conventional centralized energy scheduling modes. The highly decentralized, unpredictable and diverse nature of distributed energy resources results in existing energy scheduling systems that perform poorly in dealing with uncertainty, real-time load changes and dynamic demands.

The traditional scheduling system often depends on a single centralized control architecture, so that the advantages of distributed energy resources are difficult to fully develop, and the problems of energy waste, scheduling delay, unbalanced load and the like are easily caused. In addition, with the continuous development of energy internet, smart grid and other technologies, the complexity of modern power systems is continuously increased, and the difficulty of scheduling and management is also increased. Therefore, a more intelligent, distributed and adaptive energy scheduling method is needed to cope with the complex energy network scheduling demands, and realize efficient coordination and optimization of distributed energy resources. The multi-agent system is used as an emerging scheduling mode, and can realize the collaborative optimization of the distributed energy system on different levels through layered control and agent collaboration, so that the flexibility and response speed of the system are improved.

Patent application CN114841448A discloses a hierarchical partition load optimization regulation and control method based on a multi-agent system, wherein a hierarchical partition dynamic regulation and control framework is designed in the regulation and control method, the framework is combined with the multi-agent system, distributed resources with wide points and multiple surfaces, various types and different characteristics are gathered through the cloud edge cooperation and the Internet of things technology, and the multi-agent system is adopted for distributed management and control, so that high-efficiency energy interaction is realized. On the basis, a multi-objective optimization strategy which is economical and applicable, small in environmental pollution and high in safety performance is provided, so that hierarchical partition optimization regulation and control of the demand side resources is realized. However, this approach requires reliance on accurate modeling, resulting in performance degradation when complex energy system distributed energy resource scheduling is a problem with complexity and uncertainty.

Therefore, those skilled in the art are highly in need of developing a multi-agent system intelligent decision method capable of coping with the complexity and uncertainty problems of the distributed energy resource scheduling of the complex energy system.

Disclosure of Invention

The invention aims to provide a power distribution network intelligent decision method and system based on a knowledge embedding and multi-agent system for improving the utilization rate of distributed energy sources in a complex environment.

The aim of the invention can be achieved by the following technical scheme:

A power distribution network intelligent decision method based on knowledge embedding and a multi-agent system comprises the following steps:

collecting operation data of each distributed energy device in the power distribution network in real time, and preprocessing;

Hierarchical division is carried out on the power distribution network, and a distributed multi-layer energy control framework based on a multi-agent system is constructed;

Based on the preprocessed operation data and a distributed multi-layer energy control architecture based on a multi-agent system, an expert knowledge embedding and reinforcement learning combined method is adopted to conduct intelligent energy scheduling decision, and a decision process is completed.

Further, the operation data of each distributed energy device comprises voltage, current and power output of photovoltaic power generation, wind power generation, energy storage equipment, electric vehicles and a power network.

Further, the preprocessing operation includes performing normalization processing on the operation data of each distributed energy device, where an expression of the normalization processing is:

Wherein X _norm is normalized data, X is raw acquired data, and X _min、X_max is the minimum value and the maximum value in the data respectively.

Further, the step of constructing a multi-agent system-based distributed multi-layer energy control architecture includes:

Dividing a power distribution network into three layers from top to bottom, namely a main distribution network layer, a regional coordination layer and an equipment unit layer, and constructing a distributed multi-layer energy control architecture based on a multi-agent system, wherein the distributed multi-layer energy control architecture comprises a main distribution network layer agent, a regional coordination layer agent and an equipment unit layer agent;

The distribution network main guiding layer takes the distribution network as a main guide, the control of the whole distributed multi-layer energy control architecture is carried out through the distribution network main guiding layer agent, excitation signals are formulated according to a dispatching optimization target, and the excitation signals are communicated with the regional coordination layer agent so as to realize cross-regional cooperative dispatching;

The regional coordination layer is used for scheduling and controlling the operation of each distributed energy device in each region, each region is provided with a regional coordination layer agent, and the regional coordination layer agent performs local optimization by communicating with the device unit layer agent and combining the energy consumption requirement in the region with the scheduling optimization target;

The device unit layer comprises a plurality of distributed energy devices, each distributed energy device is provided with a device unit layer agent, the device unit layer agent communicates with the regional coordination layer agent to obtain a scheduling instruction so as to schedule the energy of each distributed energy device.

Further, the step of making an intelligent energy scheduling decision includes:

according to the scheduling optimization target, a scheduling model of the power distribution network is constructed, wherein the scheduling model of the power distribution network aims at minimizing the running cost, and the expression is as follows:

Wherein E _DSO represents the running cost, and P _t ^grid represents the electricity purchase amount of the distribution system operator from the power transmission network; representing the electricity purchase price of the distribution system operator from the power transmission network; The electricity selling and purchasing quantity of the power distribution system operator to the micro-grid i are respectively represented; the electricity selling price and the electricity purchasing price of the distribution system operator to the micro-grid i are respectively represented, wherein N is the number of MG connected to the distribution network; M is the total number of branches;

Acquiring expert knowledge from an expert knowledge base;

Based on the preprocessed operation data, the expert knowledge provides scheduling guidance, the scheduling model is solved by combining reinforcement learning, and energy scheduling decision is made according to a solving result.

Further, the reinforcement learning employs a depth deterministic strategy gradient algorithm.

Further, the architecture of the depth deterministic strategy gradient algorithm adopts actor-critic dual neural network architecture, and the training steps of the actor-critic dual neural network architecture comprise:

1) Initializing actor parameters theta ^Q of the network, parameters theta ^π of the critic network, parameters theta Q of the target actor network, parameters theta ^π' of the target critic network and an experience playback buffer;

2) Actor, selecting actions by the network according to the current strategy, executing the selected actions by the intelligent agent, and interacting with the environment to obtain the current state, rewards and the next state;

3) Experience playback, namely storing the current state, action, rewards and the next state obtained by interaction into an experience playback buffer area;

4) critic updating the network, namely randomly extracting a batch of experiences from the experience playback buffer, calculating a loss function of the critic network by the critic network by using the extracted experiences, and updating a parameter theta ^Q of critic network parameters by a gradient descent method;

5) actor updating the network, namely calculating the strategy gradient of the critic network according to the output of the critic network, and updating the parameter theta ^π of the actor network parameter by a gradient rising method, wherein the calculation expression of the strategy gradient of the critic network is as follows:

In the formula, A policy gradient that is a critic network objective function,For the actor network policy, The gradient is critic network Q-function value and actor network, n is the number of the agents, Q is the sample experience number, s _i,t is the state of the ith agent at the time t;

6) Updating the target network, namely periodically copying parameters of actor networks and critic networks into corresponding actor networks and target critic networks respectively;

7) The strategy evaluation and improvement comprises the steps of defining an evaluation function to calculate a Q-function expected value so as to measure the strategy effect of an intelligent agent, wherein the evaluation function has the expression:

Wherein J (pi) is an evaluation function, when the state s obeys the probability distribution p ^π, the expected value of a Q-function Q (s, pi (s)) obtained by executing actions according to the strategy pi is represented by the J (pi), Q (s, pi (s)) is a Q-function value obtained by executing actions according to the strategy pi by an agent, and p ^π is a probability distribution function;

8) And (3) performing iteration, namely repeatedly performing the steps 2) -7) until the iteration stopping condition is met.

Further, the expression of the reward update is:

Where r _i'_,t is the target reward for the ith agent at time t, r _i,t is the instant reward for the ith agent at time t, gamma is a relationship discount factor for balancing the current reward against the future reward, For the target critic network Q-function value,For the target actor network policy, err is the error between the predicted Q value and the target Q value, a _i,t is the action of the ith agent at the time t, and Q is the sample experience number.

Further, the method also comprises the step of optimizing a scheduling model by adopting a closed-loop feedback mechanism so as to adjust the intelligent energy scheduling decision, and comprises the following steps:

Dynamically comparing an actual scheduling execution result of the power distribution network energy scheduling decision with an expected scheduling scheme, and identifying an execution deviation;

And correcting according to the execution deviation, and adaptively optimizing the scheduling model to dynamically adjust the energy scheduling decision.

The invention also provides an intelligent decision system of the power distribution network based on the knowledge embedding and multi-agent system, which comprises the following components:

The data acquisition module is used for acquiring the operation data of each distributed energy device in the power distribution network in real time and preprocessing the operation data;

The control architecture construction module is used for carrying out hierarchical division on the power distribution network and constructing a distributed multi-layer energy control architecture based on a multi-agent system;

And the scheduling decision module is used for performing intelligent energy scheduling decision by adopting a method combining expert knowledge embedding and reinforcement learning based on the preprocessed operation data and a multi-agent system-based distributed multi-layer energy control architecture to complete the decision process.

Compared with the prior art, the invention has the following beneficial effects:

(1) According to the invention, a distributed multi-layer energy control architecture based on a multi-agent system is constructed according to the power distribution network, and the distributed energy resources of the power distribution network are scheduled by combining expert knowledge embedding and reinforcement learning technology.

(2) The distributed multi-layer energy control architecture based on the multi-agent system realizes the fine management and collaborative optimization of energy resources of different layers, and improves the flexibility and response speed of the system.

(3) According to the invention, expert domain knowledge is embedded into a scheduling decision, the intelligence and the interpretation of the power distribution network are enhanced, the decision in a complex scene is more reasonable and reliable, and the scheduling strategy can be optimized in a self-adaptive manner by combining a reinforcement learning algorithm, so that the dynamic change and uncertainty in an energy system are effectively treated, and more efficient energy management and optimization control are realized.

(4) The invention sets the closed-loop feedback mechanism to optimize the scheduling model, thereby continuously running experience, obtaining the optimal scheduling path and strategy, and improving the overall efficiency and robustness of the scheduling model.

Drawings

FIG. 1 is a schematic flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of a distributed multi-layered energy control architecture based on a multi-agent system according to the present invention;

Fig. 3 is a DDPG training framework of the present invention.

Detailed Description

The invention will now be described in detail with reference to the drawings and specific examples. The present embodiment is implemented on the premise of the technical scheme of the present invention, and a detailed implementation manner and a specific operation process are given, but the protection scope of the present invention is not limited to the following examples.

Example 1

The embodiment provides an intelligent decision-making method of a power distribution network based on knowledge embedding and a multi-agent system, which comprises the steps of firstly, acquiring real-time operation data of distributed energy resources through a distributed sensor network, wherein the data comprise parameters such as voltage, current, power output and the like of photovoltaic power generation, wind power generation, energy storage equipment and electric automobiles, and carrying out standardized processing on the data; then, the power grid is divided into a main distribution network layer, a regional coordination layer and an equipment unit layer based on a multi-agent system, agents of all layers are responsible for corresponding scheduling tasks to ensure the stability and flexibility of the system, intelligent scheduling decisions are carried out by combining expert knowledge and a depth deterministic strategy gradient (DDPG) algorithm, an adaptive scheduling strategy is generated through reinforcement learning, energy allocation is dynamically optimized, and finally scheduling execution results are monitored and adjusted in real time through a closed loop feedback mechanism, differences between actual execution and expected results are analyzed, and scheduling model parameters are optimized. Specifically, as shown in fig. 1, the method includes the steps of:

S1, data acquisition and preprocessing.

The invention firstly collects the operation data of the distributed energy resources in real time through the distributed sensor network, and the collected data cover the key operation parameters of various distributed energy resources such as photovoltaic power generation, wind power generation, an energy storage system, an electric automobile and the like, including voltage, current, power output, load change, charging state and the like. In the data acquisition process, the sensor transmits real-time data to a central database of the system through the Internet of things equipment. In order to ensure the consistency and availability of the data, the invention performs the following standardized processing on the data from different sources, and provides a basis for the subsequent optimization decision.

S2, constructing a distributed multi-layer energy control architecture based on a multi-agent system.

In the invention, the power grid is divided into three layers of a distribution network main guide layer, a region coordination layer and a device unit layer from top to bottom, and distributed multi-level energy control is realized through a multi-agent system, as shown in fig. 2. In the architecture, each layer is controlled by a corresponding agent, and the energy distribution and the scheduling optimization of the whole system are realized through coordination optimization, so that the effective utilization of resources and the stable operation of the system are ensured.

(1) And the distribution network main guiding layer takes the distribution network as a leading part, and realizes cross-region cooperative scheduling by communicating with agents of the regional coordination layers. The main conducting layer of the distribution network formulates an excitation signal according to the optimization target of the whole system, and the excitation signal is transmitted to the equipment units through each level of agent layers, so that the stable and safe operation of the whole system is ensured. Its core function is to implement global scheduling and management of multiple virtual power plants and their distributed resources.

The method comprises the steps of taking the minimized running cost of a power distribution network as an optimization target, establishing a scheduling model of the power distribution network, and performing energy scheduling on each distributed energy device, wherein the scheduling model is as follows:

Wherein E _DSO represents the running cost, and P _t ^grid represents the electricity purchase amount of the distribution system operator from the power transmission network; representing the electricity purchase price of the distribution system operator from the power transmission network; The electricity selling and purchasing quantity of the power distribution system operator to the micro-grid i are respectively represented; the electricity selling price and the electricity purchasing price of the distribution system operator to the micro-grid i are respectively represented, wherein N is the number of MG connected to the distribution network; the energy loss of the branch is calculated, and M is the total number of the branches.

(2) And the regional coordination layer is responsible for the operation scheduling and control of the equipment in the region. The hierarchical agent acquires real-time equipment state data through communication with the equipment unit layer agent, and performs local optimization by combining the energy consumption requirement in the area with the overall system target. The regional coordination layer can respond to the excitation signals sent by the upper layer agent, so that the autonomous optimization of various energy devices in the region is realized, and the energy balance of the local region is ensured.

(3) And the equipment unit layer is used for controlling each distributed energy resource unit such as photovoltaic power generation, an energy storage device, wind power, flexible load and the like. The device unit layer agent acquires a scheduling instruction through communication with the regional coordination layer, and adjusts the power output and the running state of the device in real time. The equipment unit layer is not only responsible for the safe operation of equipment, but also maximally dissipates clean energy in the regulation and control process, and promotes the efficient utilization of energy.

Through the hierarchical control structure, collaborative optimization is realized among all levels, so that the response speed and the scheduling flexibility of the system are effectively improved, the energy waste in the system is reduced, and the efficient consumption of clean energy and the economic operation of the system are realized to the maximum extent.

S3, intelligent decision making is carried out based on expert knowledge embedding and reinforcement learning.

The intelligent scheduling decision of the distributed energy resource is realized by combining expert knowledge embedding and depth deterministic strategy Gradient (DEEP DETERMINISTIC Policy Gradient, DDPG) reinforcement learning algorithm. The method not only utilizes the field experience of an expert, but also continuously optimizes the scheduling strategy in a complex dynamic environment through a reinforcement learning algorithm, so that the system can make optimal decisions under different loads and uncertainty scenes.

The expert knowledge base is a scheduling experience and decision rule of various typical scenes, including historical scheduling experience, load characteristics under different meteorological conditions, equipment operation constraint, strategy for coping with sudden events and the like. By embedding these expert knowledge into the intelligent decision system, guidance can be provided in the reinforcement learning process so that the scheduling decisions can be enhanced not only by data driving, but also by the actual experience of the expert. For example, during a hot summer peak period, historical experience has shown that the air conditioning load in this area increases dramatically, resulting in a dramatic increase in power demand, photovoltaic power generation provides sufficient power during the day, but is under-powered during the evening due to insufficient photovoltaic power generation capacity. According to historical electricity consumption during similar peak electricity consumption, the expert knowledge base provides scheduling guidance that the energy storage device should be charged preferentially in the daytime to release electricity in preparation for the peak evening, and gradually discharge according to load requirements at the evening, and meanwhile, the schedulable loads such as air conditioner and electric automobile charging time are adjusted appropriately to reduce the power peak. By embedding the scheduling strategy into the intelligent decision system, the reinforcement learning process not only can generate the scheduling strategy based on real-time data, but also can refer to empirical knowledge provided by experts, so that energy can be reasonably distributed by the energy storage equipment under the condition of rapid load increase, and the scheduling of the power grid is more prospective and safer.

The reinforcement learning adopts DDPG algorithm, as shown in fig. 3, the DDPG algorithm adopts actor-critic structure, performs intelligent agent action exploration by random strategy, and performs algorithm updating by deterministic gradient strategy. Wherein actor and critic are two deep neural networks (deep neural network, DNN), critic is also called Q network, which can obtain global information and global actions of the system, actor is also called pi network, which acts only according to local environment. The two networks are trained respectively to minimize the loss function, network updating is realized, and in order to solve the problem of unstable updating, the algorithm creates a backup network, namely a target network, for the two networks.

In critic networks, the parameter vector θ ^Q is used to estimate the value Q ^*(s,a|θ^Q of a Q-function, also known as a state-action value function, used to evaluate the decision quality of the participants and to provide gradient direction information to the algorithm. In actor networks, the parameter vector θ ^π is used to estimate a policy pi ^*(s|θ^π) that maps states to agents, makes decisions and outputs continuous actions. Defining an evaluation function J (pi) as a Q-function expected value to measure the effect of the agent strategy:

Wherein p ^π is a probability distribution function, Q (s, pi (s)) represents a Q-function value obtained by the intelligent agent executing action according to the strategy pi, and J (pi) represents a Q-function Q (s, pi (s)) expected value obtained by executing action according to the strategy pi when the state s obeys the probability distribution p ^π.

The intelligent agent accumulates experience through interaction with the environment to update the network, the essence of the intelligent agent is policy update, actor network selects action through initial random policy, critic network evaluates the action and outputs corresponding Q-function value, guidance actor updates policy parameters according to the evaluation value, and the steps are repeated to gradually increase the accumulated rewards obtained by the policy, wherein the updating process is as follows:

wherein the following equation represents a prize update, r _i'_,t is a target prize, For the target critic network Q-function value,Network policy is targeted actor. The following equation represents critic network errors, and network updating is achieved by minimizing the errors; Q is the empirical number of samples for the actual critic network Q-function value. The following formula represents the network policy gradient to determine the update direction of the algorithm; gradient of critic network and actor network, respectively.

Specifically, the training steps of the actor-critic dual neural network architecture comprise:

5) actor updating the network, namely calculating a strategy gradient of the critic network according to the output of the critic network, and updating the parameter theta ^π of the actor network parameter by a gradient ascent method, wherein the strategy gradient of the critic network is calculated according to a formula (5);

7) Defining an evaluation function to calculate a Q-function expected value so as to measure the policy effect of the intelligent agent, wherein the evaluation function is shown as a formula (2);

In the distributed multi-layer energy control architecture, intelligent decisions obtained by reinforcement learning are executed through different layers, so that the system can operate efficiently and safely. First, the distribution main guiding layer receives and integrates state information from each region through communicating with agents of the coordination layers of each region, and generates a scheduling strategy according to a global optimization target. The main aim of the layer is to coordinate and schedule a plurality of virtual power plants and distributed resources, so as to ensure the stability of the system. The intelligent scheduling strategy generated by the reinforcement learning algorithm at the layer is transmitted to the regional hierarchy through an incentive mechanism, so that coordination and reasonable resource allocation of each region are promoted. Second, the agents of the regional coordination layer are responsible for performing these intelligent decisions and scheduling in conjunction with the actual state of the devices within the region. The function of the agent at the layer is equivalent to that of the regional agent, and the action generated by the agent can be finely adjusted according to the information such as the load condition of the region, the equipment state and the like, so that the energy balance of the whole system is realized. For example, the agent may optimize the charge and discharge timing of the energy storage devices in the region, or coordinate the use of renewable energy sources, through reinforcement learning strategies, ensuring efficient use of energy sources. At the equipment unit layer, the agent corresponds to agents of various distributed energy resources, such as photovoltaic power generation, energy storage devices or wind power equipment. At this level, the agent performs specific operations based on the real-time status of the device, such as adjusting the output of photovoltaic power generation, controlling the charging and discharging processes of the energy storage device, and so on. The agent of the device unit layer directly executes the dispatching instruction from the upper layer, so that quick response can be ensured in actual operation and safe operation of the device is ensured.

Through the design of the layered architecture, the intelligent decision of reinforcement learning can not only realize the optimization of the system at the global level, but also realize the layer-by-layer realization to the region and the equipment layer, thereby ensuring the efficient dispatching and resource optimization from the global to the local.

S4, optimizing a scheduling model by a closed loop feedback mechanism.

In the invention, the closed loop feedback mechanism plays a key scheduling optimization role, and the actual execution result is compared and analyzed with the expected scheduling scheme by monitoring the execution condition of the scheduling scheme in real time, so that the scheduling model is continuously optimized. The closed loop feedback mechanism ensures that the system can adapt to changes under different scenes and load conditions, and the precision and efficiency of scheduling decisions are continuously improved according to the execution effect. Specifically, the feedback mechanism includes the following steps:

And S4.1, real-time monitoring and data feedback. In the process of executing the scheduling scheme, the system tracks the running state of the distributed energy resources in real time through the sensor network and the monitoring module, wherein the running state comprises the charging and discharging conditions, the load change, the running state of the equipment and the like of the power generation and energy storage equipment. All real-time data are transmitted to a central control system through the Internet of things equipment, and are dynamically compared with a scheduling scheme generated before, so that possible deviation in the execution process is identified.

S4.2, feedback data analysis and error correction. The system can identify the abnormality or inconsistency in the scheduling by analyzing the deviation of the actual execution result from the expected scheme. For example, if the actual output power of a distributed energy device is below expected, or the discharge time of the energy storage device does not reach the expected effect, the system may capture these anomalies via a feedback mechanism. And then, the system carries out error correction according to the deviation value, adjusts parameters in a scheduling strategy, and ensures that the next scheduling period can be adjusted and optimized for the problem in the previous period.

And S4.3, dynamically adjusting the scheduling model. Based on the feedback data, the scheduling model in the system can be adaptively optimized. The system can dynamically adjust the scheduling strategy of various distributed energy sources by combining reinforcement learning algorithm (DDPG) with expert knowledge. For example, when the system recognizes that the electricity demand suddenly changes, the scheduling model can immediately adjust the charging and discharging strategies of the energy storage equipment, and redistribute the load according to the actual demand so as to ensure the stable operation of the whole power grid.

And S4.4, long-term optimization and strategy upgrading. In addition to short-term feedback tuning, the system also performs policy upgrades based on long-term scheduling data. Through closed loop feedback of multiple scheduling periods, the system continuously accumulates operation experience and gradually optimizes scheduling rules in an expert knowledge base. In long-term operation, the system can summarize the optimal scheduling path and strategy through big data analysis, so that the overall efficiency and robustness of the scheduling model are improved.

The embodiment applies the method to an actual scene, and mainly comprises the following steps:

And step 1), acquiring real-time operation data of the distributed energy resources, wherein the real-time operation data comprise parameters such as photovoltaic power generation, wind power generation, voltage, current, power output, charge and discharge states, load demands and the like of energy storage equipment. Through the distributed sensor network, the system can monitor the key data in real time. All the collected data are subjected to standardized processing through a data preprocessing module, so that different types of energy resources can be uniformly processed, and reliable data support is provided for subsequent scheduling analysis;

Step 2) a distributed multi-layer energy control architecture based on a multi-agent system. The power grid system is divided into three layers, namely a distribution network main guide layer, a region coordination layer and a device unit layer. The agents of each hierarchy are respectively responsible for the corresponding scheduled tasks. The distribution network main-guide agent makes a global scheduling strategy through cooperation with each regional coordination layer, so that the overall stability and efficient operation of the system are ensured. The regional coordination layer agent integrates the running states of the distributed energy sources in the regional to generate a local optimization scheme. The equipment unit layer agent controls specific energy resource equipment, ensures that the equipment unit layer agent runs in a safe range and maximally absorbs clean energy;

And 3) performing intelligent scheduling decision based on expert knowledge embedding and reinforcement learning (DDPG) algorithm. Under the guidance of an expert knowledge base, the system combines historical data and real-time data, and optimizes the energy scheduling strategy through DDPG algorithm. The DDPG algorithm generates optimal scheduling actions, such as adjusting charge and discharge time of the energy storage device or adjusting load distribution, according to the current system state, so that the system can adapt to complex and changeable requirements and uncertainty. The expert knowledge base provides scheduling experience in a special scene, so that the system can accelerate convergence in the reinforcement learning process and make reasonable scheduling decisions;

And 4) monitoring the execution effect of the scheduling scheme in real time through a closed loop feedback mechanism. The system compares the actual execution result with the expected scheduling scheme, identifies possible deviation, analyzes the cause of the deviation, and corrects the scheduling model through a feedback mechanism. The system optimizes the parameters of the scheduling model according to the feedback result, and ensures that the scheduling strategy is more accurate and efficient in the next round of execution.

Example 2

The embodiment provides a power distribution network intelligent decision system based on knowledge embedding and a multi-agent system, which comprises:

The remainder were as in example 1.

The above functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The scheme in the embodiment of the invention can be realized by adopting various computer languages, such as object-oriented programming language Java, an transliteration script language JavaScript and the like.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A distribution network intelligent decision-making method based on knowledge embedding and multi-agent system, characterized by comprising the following steps:

Collect the operating data of each distributed energy device in the distribution network in real time and perform pre-processing;

Divide the distribution network into layers and build a distributed multi-layer energy control architecture based on a multi-agent system;

Based on the pre-processed operation data and the distributed multi-layer energy control architecture based on the multi-agent system, a method combining expert knowledge embedding and reinforcement learning is adopted to make intelligent energy scheduling decisions and complete the decision-making process.

2. According to claim 1, a distribution network intelligent decision-making method based on knowledge embedding and multi-agent system is characterized in that the operating data of each distributed energy device includes the voltage, current and power output of photovoltaic power generation, wind power generation, energy storage equipment, electric vehicles and power networks.

3. According to the method of intelligent decision-making of distribution network based on knowledge embedding and multi-agent system in claim 1, it is characterized in that the preprocessing operation includes standardizing the operation data of each distributed energy device, and the expression of the standardization is:

Where X _norm is the standardized data, X is the original acquired data, X _min and X _max are the minimum and maximum values in the data, respectively.

4. According to the method of intelligent decision-making of distribution network based on knowledge embedding and multi-agent system in claim 1, it is characterized in that the step of constructing a distributed multi-layer energy control architecture based on multi-agent system comprises:

The distribution network is divided into three levels from top to bottom, namely the distribution network dominant layer, regional coordination layer and equipment unit layer, and a distributed multi-layer energy control architecture based on a multi-agent system is constructed, including the distribution network dominant layer agent, regional coordination layer agent and equipment unit layer agent;

The distribution network leading layer is dominated by the distribution network, and controls the entire distributed multi-layer energy control architecture through the distribution network leading layer agent, formulates incentive signals according to the scheduling optimization goals, and communicates with the regional coordination layer agent to achieve cross-regional collaborative scheduling;

The regional coordination layer is used for operation scheduling and control of each distributed energy device in each region. Each region is provided with a regional coordination layer agent, which communicates with the device unit layer agent and performs local optimization in combination with the energy consumption demand in the region and the scheduling optimization target.

The device unit layer includes a variety of distributed energy devices, each distributed energy device is provided with a device unit layer agent, and the device unit layer agent communicates with the regional coordination layer agent to obtain scheduling instructions to perform energy scheduling for each distributed energy device.

5. The method for intelligent decision-making of a distribution network based on knowledge embedding and multi-agent system according to claim 4, characterized in that the step of making intelligent energy dispatching decisions comprises:

According to the dispatch optimization target, a dispatch model of the distribution network is constructed, wherein the dispatch model of the distribution network aims to minimize the operating cost, and the expression is:

In the formula, E _DSO represents the operating cost, P _t ^grid represents the amount of electricity purchased by the distribution system operator from the transmission grid; It represents the price at which the distribution system operator purchases electricity from the transmission grid; They represent the amount of electricity sold and purchased by the distribution system operator to microgrid i respectively; They represent the electricity sales and purchase prices from the distribution system operator to the microgrid i, respectively; N is the number of MGs connected to the distribution network; is the branch power loss; M is the total number of branches;

Acquire expert knowledge from expert knowledge base;

Based on the preprocessed operating data, the expert knowledge provides scheduling guidance, the scheduling model is solved in combination with reinforcement learning, and energy scheduling decisions are made according to the solution results.

6. According to the method of intelligent decision-making of distribution network based on knowledge embedding and multi-agent system in claim 5, it is characterized in that the reinforcement learning adopts deep deterministic policy gradient algorithm.

7. The method for intelligent decision-making of a distribution network based on knowledge embedding and multi-agent system according to claim 6, characterized in that the architecture of the deep deterministic policy gradient algorithm adopts an actor-critic dual neural network architecture, and the training steps of the actor-critic dual neural network architecture include:

1) Initialization: Initialize the parameters θ ^Q of the actor network, the parameters θ ^π of the critic network, the parameters θ Q ' of the target actor network, the parameters θ ^π ' of the target critic network, and the experience replay buffer;

2) Exploration and interaction: The actor network selects actions based on the current strategy, the agent executes the selected actions and interacts with the environment to obtain the current state, reward, and next state;

3) Experience replay: store the current state, action, reward and next state obtained from the interaction into the experience replay buffer;

4) Critic network update: randomly extract a batch of experiences from the experience playback buffer, the critic network uses the extracted experiences to calculate the loss function of the critic network, and updates the parameters θ ^Q of the critic network parameters by the gradient descent method;

5) Actor network update: Calculate the policy gradient of the critic network according to the output of the critic network, and update the parameters θ ^π of the actor network parameters by the gradient ascent method, where the calculation expression of the policy gradient of the critic network is:

In the formula, is the policy gradient of the critic network objective function, is the actor network strategy, are the Q-function value of the critic network and the gradient of the actor network, n is the number of agents, q is the number of sample experiences, s _i,t is the state of the i-th agent at time t;

6) Target network update: Regularly copy the parameters of the actor network and critic network to the corresponding actor network and target critic network respectively;

7) Strategy evaluation and improvement: Define an evaluation function to calculate the expected value of the Q-function to measure the strategy effect of the agent. The expression of the evaluation function is:

Where J(π) is the evaluation function, which means the expected value of the Q-function Q(s,π(s)) obtained by executing actions according to strategy ^π when state s obeys probability distribution pπ, Q(s,π(s)) is the Q-function value obtained by the agent executing actions according to strategy π, and ^pπ is the probability distribution function;

8) Iterative execution: Repeat steps 2)-7) until the iteration stop condition is met.

8. According to claim 7, a distribution network intelligent decision-making method based on knowledge embedding and multi-agent system is characterized in that the expression of the reward update is:

In the formula, _ri ' _,t is the target reward of the ith agent at time t, ri _,t is the immediate reward of the ith agent at time t, and γ is the discount factor used to weigh the relationship between current rewards and future rewards. is the target critic network Q-function value, is the target actor network strategy, err is the error between the predicted Q value and the target Q value, a _i,t is the action of the ith agent at time t, and q is the number of sample experiences.

9. The method for intelligent decision-making of a distribution network based on knowledge embedding and multi-agent system according to claim 1, characterized in that it also includes the step of optimizing the scheduling model by adopting a closed-loop feedback mechanism to adjust the intelligent energy scheduling decision, including the following steps:

Dynamically compare the actual dispatch execution results of the distribution network energy dispatch decision with the expected dispatch plan to identify execution deviations;

Corrections are made based on the execution deviations, and the scheduling model is adaptively optimized to dynamically adjust energy scheduling decisions.

10. A distribution network intelligent decision-making system based on knowledge embedding and multi-agent system, characterized by comprising:

Data acquisition module: used to collect the operating data of each distributed energy device in the distribution network in real time and perform preprocessing;

Control architecture building module: used to divide the distribution network into layers and build a distributed multi-layer energy control architecture based on a multi-agent system;

Scheduling decision module: It is used to make intelligent energy scheduling decisions based on the pre-processed operation data and the distributed multi-layer energy control architecture based on the multi-agent system, using a method combining expert knowledge embedding and reinforcement learning to complete the decision-making process.