+

CN118554555A - MASAC algorithm-based distributed photovoltaic voltage reactive power control method for power distribution network - Google Patents

MASAC algorithm-based distributed photovoltaic voltage reactive power control method for power distribution network Download PDF

Info

Publication number
CN118554555A
CN118554555A CN202410467035.XA CN202410467035A CN118554555A CN 118554555 A CN118554555 A CN 118554555A CN 202410467035 A CN202410467035 A CN 202410467035A CN 118554555 A CN118554555 A CN 118554555A
Authority
CN
China
Prior art keywords
voltage
agent
distribution network
network
strategy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410467035.XA
Other languages
Chinese (zh)
Inventor
吴浩
邹斌
杨金明
陶金
戴亮
董庆森
韩禹
李季
鞠秋萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taizhou Kaitai Electric Power Design Co ltd
Jiangsu Xiangtai Electric Power Industry Co ltd
Taizhou Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
Taizhou Kaitai Electric Power Design Co ltd
Jiangsu Xiangtai Electric Power Industry Co ltd
Taizhou Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taizhou Kaitai Electric Power Design Co ltd, Jiangsu Xiangtai Electric Power Industry Co ltd, Taizhou Power Supply Co of State Grid Jiangsu Electric Power Co Ltd filed Critical Taizhou Kaitai Electric Power Design Co ltd
Priority to CN202410467035.XA priority Critical patent/CN118554555A/en
Publication of CN118554555A publication Critical patent/CN118554555A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J3/00Circuit arrangements for AC mains or AC distribution networks
    • H02J3/38Arrangements for parallely feeding a single network by two or more generators, converters or transformers
    • H02J3/46Controlling of the sharing of output between the generators, converters, or transformers
    • H02J3/50Controlling the sharing of the out-of-phase component
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J13/00Circuit arrangements for providing remote indication of network conditions, e.g. an instantaneous record of the open or closed condition of each circuitbreaker in the network; Circuit arrangements for providing remote control of switching means in a power distribution network, e.g. switching in and out of current consumers by using a pulse code signal carried by the network
    • H02J13/00004Circuit arrangements for providing remote indication of network conditions, e.g. an instantaneous record of the open or closed condition of each circuitbreaker in the network; Circuit arrangements for providing remote control of switching means in a power distribution network, e.g. switching in and out of current consumers by using a pulse code signal carried by the network characterised by the power network being locally controlled
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2203/00Indexing scheme relating to details of circuit arrangements for AC mains or AC distribution networks
    • H02J2203/20Simulating, e g planning, reliability check, modelling or computer assisted design [CAD]
    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J2300/00Systems for supplying or distributing electric power characterised by decentralized, dispersed, or local generation
    • H02J2300/20The dispersed energy generation being of renewable origin
    • H02J2300/22The renewable source being solar energy
    • H02J2300/24The renewable source being solar energy of photovoltaic origin
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/30Reactive power compensation

Landscapes

  • Engineering & Computer Science (AREA)
  • Power Engineering (AREA)
  • Supply And Distribution Of Alternating Current (AREA)

Abstract

本发明公开了一种基于MASAC算法的配电网分布式光伏电压无功控制方法,属于电力系统自动化技术和人工智能强化学习领域,首先构建计及分布式PV的配电网电压无功去中心化控制框架,将配电网的电压无功去中心化控制问题转化为一个马尔科夫博弈模型;然后构建MASAC算法求解马尔科夫博弈模型,为每个智能体构建Actor和Critic神经网络,采用集中式训练的方式神经网络进行训练,得到配电网分布式光伏电压无功控制模型,然后利用该模型实现配电网电压的在线控制,完成配电网电压的在线控制,实现光伏逆变器的去中心化调控。本发明可以降低通信需求与计算负担,改善电网电压稳定性,适用性广泛且灵活。

The invention discloses a distributed photovoltaic voltage reactive power control method for a distribution network based on a MASAC algorithm, which belongs to the field of power system automation technology and artificial intelligence reinforcement learning. First, a distribution network voltage reactive power decentralized control framework taking into account distributed PV is constructed, and the voltage reactive power decentralized control problem of the distribution network is converted into a Markov game model; then a MASAC algorithm is constructed to solve the Markov game model, and an Actor and Critic neural network is constructed for each intelligent agent, and the neural network is trained in a centralized training manner to obtain a distributed photovoltaic voltage reactive power control model for the distribution network, and then the model is used to realize online control of the distribution network voltage, complete online control of the distribution network voltage, and realize decentralized regulation of photovoltaic inverters. The invention can reduce communication requirements and computing burdens, improve grid voltage stability, and has wide applicability and flexibility.

Description

一种基于MASAC算法的配电网分布式光伏电压无功控制方法A distributed photovoltaic voltage and reactive power control method for distribution network based on MASAC algorithm

技术领域Technical Field

本发明属于电力系统自动化技术和人工智能强化学习领域,具体涉及一种基于MASAC算法的配电网分布式光伏电压无功控制方法。The present invention belongs to the field of power system automation technology and artificial intelligence reinforcement learning, and specifically relates to a distributed photovoltaic voltage reactive power control method for a distribution network based on a MASAC algorithm.

背景技术Background Art

随着分布式能源资源特别是PV(photovoltaic,光伏)的渗透率不断增加,它们对于满足日益增长的电力需求和环境保护愈发重要。然而,这些资源的不可预测性和波动性给系统运营商带来了诸多技术挑战。特别是在低负载情况下,由于PV过度渗透引起的反向电流导致的过电压问题,这一点尤其值得关注。为了改善配电网络的电压曲线,无功电压控制(VVC)是一种有效的工具,可用于控制传统无功电源或新加入逆变器的无功功率设定点,以调节配电网的电压。VVC方法通过利用电容器组和智能逆变器的无功功率吸收/注入能力来调节电压。从控制策略的角度来看,电压调节方法可以分为四类:集中控制、本地控制和分布式控制以及去中心化控制。集中控制需要建立快速的通信渠道,成本较高;本地控制和分布式控制则受制于智能体间协调的需求,而在许多应用中这可能是不可行的。最后,去中心化控制通过结合分布式和集中控制方法的优点,利用分区控制以及区间协调。传统的去中心化控制模型需要系统拓扑和参数,这在实际的配电系统中尤其是在大量屋内光伏单元普及的情况下难以获得。因此,开发一种不依赖配电网精确参数的去中心化控制方法将克服上述挑战。As the penetration of distributed energy resources, especially PV (photovoltaic), continues to increase, they are becoming increasingly important for meeting the growing electricity demand and protecting the environment. However, the unpredictability and volatility of these resources pose many technical challenges to system operators. In particular, the overvoltage problem caused by reverse current caused by excessive PV penetration is of particular concern under low load conditions. In order to improve the voltage profile of the distribution network, reactive voltage control (VVC) is an effective tool that can be used to control the reactive power set point of traditional reactive sources or newly added inverters to regulate the voltage of the distribution network. The VVC method regulates the voltage by utilizing the reactive power absorption/injection capabilities of capacitor banks and smart inverters. From the perspective of control strategies, voltage regulation methods can be divided into four categories: centralized control, local control, distributed control, and decentralized control. Centralized control requires the establishment of fast communication channels, which is costly; local control and distributed control are subject to the need for coordination between agents, which may not be feasible in many applications. Finally, decentralized control combines the advantages of distributed and centralized control methods and utilizes partition control and interval coordination. Traditional decentralized control models require system topology and parameters, which are difficult to obtain in actual distribution systems, especially when a large number of indoor photovoltaic units are popular. Therefore, developing a decentralized control method that does not rely on the precise parameters of the distribution network will overcome the above challenges.

深度强化学习作为最常用的基于机器学习的方法之一,因为它们可以学习到最优控制策略而广受关注。在深度强化学习方法中,通过与环境的持续互动获得行动与状态之间的相关性。因此,它减少了对获取系统参数完整信息的依赖。一个训练有素的深度强化学习智能体可以为现实世界中的任何新动态提供适应性强的行动。已有研究使用基于深度强化学习的无功电压控制框架进行配电网络的电压调节。然而,这些方法需要通信链接和集中处理以根据系统状态做出决策,因此它们可能不适用于具有成千上万PV的大规模电网。有研究提出了一种多智能体深度强化学习(MADRL)方法,使用多智能体深度确定性策略梯度(MADDPG)算法进行电压调节,该方法仅在集中训练阶段进行,而在去中心化执行阶段减少了众多分布式资源之间实时通信的必要性。然而,它们探索效率不佳,不适合解决大规模电力系统中的多智能体决策场景。Deep reinforcement learning, as one of the most commonly used machine learning-based methods, has attracted widespread attention because they can learn optimal control strategies. In deep reinforcement learning methods, the correlation between actions and states is obtained through continuous interaction with the environment. Therefore, it reduces the dependence on obtaining complete information about system parameters. A well-trained deep reinforcement learning agent can provide adaptable actions for any new dynamics in the real world. Studies have used a reactive voltage control framework based on deep reinforcement learning for voltage regulation in distribution networks. However, these methods require communication links and centralized processing to make decisions based on system status, so they may not be suitable for large-scale power grids with thousands of PVs. A multi-agent deep reinforcement learning (MADRL) method was proposed, which uses the multi-agent deep deterministic policy gradient (MADDPG) algorithm for voltage regulation. This method is only performed in the centralized training phase, while the decentralized execution phase reduces the necessity of real-time communication between numerous distributed resources. However, they have poor exploration efficiency and are not suitable for solving multi-agent decision-making scenarios in large-scale power systems.

发明内容Summary of the invention

本发明针对现有技术中存在的问题,提供了一种基于MASAC算法的配电网分布式光伏电压无功控制方法,更好地应用于大规模电力系统光伏调控。In view of the problems existing in the prior art, the present invention provides a distributed photovoltaic voltage reactive power control method for a distribution network based on a MASAC algorithm, which is better applied to photovoltaic regulation of large-scale power systems.

为解决以上技术问题,本发明提供如下技术方案:一种基于MASAC算法的配电网分布式光伏电压无功控制方法,包括如下步骤:In order to solve the above technical problems, the present invention provides the following technical solutions: a distributed photovoltaic voltage reactive power control method for distribution network based on MASAC algorithm, comprising the following steps:

S1、构建记及分布式光伏的配电网电压无功去中心化控制框架,将配电网电压无功去中心化控制问题转化为马尔科夫博弈模型;S1. Construct a decentralized control framework for voltage and reactive power of distribution network with distributed photovoltaics, and transform the problem of decentralized control of voltage and reactive power of distribution network into a Markov game model.

配电网电压无功去中心化控制框架包括:以最小化一段时间内配电网的有功功率损耗为目标、以分布式的光伏逆变器为决策变量、以及以预设电压范围为约束条件;The framework of decentralized control of voltage and reactive power in distribution network includes: minimizing the active power loss of distribution network over a period of time as the goal, using distributed photovoltaic inverters as decision variables, and using a preset voltage range as a constraint;

马尔科夫博弈模型包括:状态空间:各个智能体所包括的光伏逆变器的有功/无功功率的净注入量和光伏逆变器电压幅值构成的局部观测值集合,动作空间:所有智能体控制的光伏逆变器的无功输出量构成的动作集合;奖励函数:由有功损耗成本、以及电压越限惩罚构建;状态转移过程:智能体的状态遵循配电网的潮流计算约束并且根据状态转移概率分布进行更新;The Markov game model includes: state space: a set of local observation values consisting of the net injection of active/reactive power of the photovoltaic inverters and the voltage amplitude of the photovoltaic inverters included in each agent, action space: a set of actions consisting of the reactive output of the photovoltaic inverters controlled by all agents; reward function: constructed by the active loss cost and the voltage limit penalty; state transfer process: the state of the agent follows the power flow calculation constraints of the distribution network and is updated according to the state transfer probability distribution;

S2、构建MASAC算法求解马尔科夫博弈模型,为每个智能体构建Actor和Critic神经网络,Actor神经网络决定智能体的策略,Critic神经网络用于判定策略的价值;S2. Construct the MASAC algorithm to solve the Markov game model, and construct Actor and Critic neural networks for each agent. The Actor neural network determines the strategy of the agent, and the Critic neural network is used to determine the value of the strategy.

采用集中式训练的方式神经网络进行训练,得到配电网分布式光伏电压无功控制模型,然后利用该模型实现配电网电压的在线控制,完成配电网电压的在线控制,实现光伏逆变器的去中心化调控。The neural network is trained in a centralized training manner to obtain a distributed photovoltaic voltage reactive control model for the distribution network. This model is then used to realize online control of the distribution network voltage and achieve decentralized regulation of the photovoltaic inverter.

进一步地,前述的步骤S1中,以最小化一段时间内配电网的有功功率损耗为目标,具体是构建目标函数如下式:Furthermore, in the aforementioned step S1, the objective is to minimize the active power loss of the distribution network within a period of time, and specifically, the objective function is constructed as follows:

其中,T表示优化时间段,Ploss(t)表示时刻t的有功网络损耗。Wherein, T represents the optimization time period, and P loss (t) represents the active network loss at time t.

进一步地,前述的步骤S1中,约束条件为:Furthermore, in the aforementioned step S1, the constraint condition is:

其中,Vk(t)是光伏逆变器k在t时刻的电压,V和分别为预设的电压下限和上限。Where V k (t) is the voltage of PV inverter k at time t, V and They are the preset voltage lower and upper limits respectively.

进一步地,前述的步骤S1中,状态空间S为所有光伏逆变器在t时刻局部观测值si,t的集合,si,t为智能体i在t时刻的局部观测值,si,t=(pi,qi,vi),pi,qi和vi分别代表智能体i所在的光伏逆变器的有功/无功功率的净注入量和节点电压幅值;Furthermore, in the aforementioned step S1, the state space S is a set of local observation values si ,t of all photovoltaic inverters at time t, si ,t is a local observation value of agent i at time t, si,t = ( pi , qi , vi ), pi , qi and vi represent the net injection amount of active/reactive power and the node voltage amplitude of the photovoltaic inverter where agent i is located, respectively;

动作空间A为所有智能体在t时刻控制的光伏逆变器的无功输出量构成的动作ai,t的集合,QPV,i,t是智能体i在时间t时刻所控制的PV逆变器的无功输出量。The action space A is the set of actions a i,t consisting of the reactive output of the PV inverters controlled by all agents at time t, and Q PV,i,t is the reactive output of the PV inverter controlled by agent i at time t.

进一步地,前述的步骤S1中,奖励函数如下式:Furthermore, in the aforementioned step S1, the reward function is as follows:

式中,R(t)为t时刻的奖励值,Ploss(t)为t时刻的有功损耗,函数为0-1判别函数,当k节点的电压Vk(t)满足上下限V,时函数f为0,否则为1,σ1为单位有功损耗成本,σ2为电压越限惩罚因子。In the formula, R(t) is the reward value at time t, P loss (t) is the active power loss at time t, and the function is a 0-1 discriminant function. When the voltage V k (t) of the k node satisfies the upper and lower limits V , When function f is 0, otherwise it is 1, σ 1 is the unit active power loss cost, and σ 2 is the voltage limit penalty factor.

进一步地,前述的步骤S1中,状态转移过程利用PYPOWER潮流计算工具建立配电网的环境,使用runpf函数进行潮流计算,潮流计算约束包括功率平衡约束和潮流约束;智能体的状态转移概率分布为P(s′|s,a),表示智能体根据当前状态St采取动作at后,环境在动作at作用下,由St转移至S′t的概率。Furthermore, in the aforementioned step S1, the state transfer process uses the PYPOWER power flow calculation tool to establish the distribution network environment, uses the runpf function to perform power flow calculation, and the power flow calculation constraints include power balance constraints and power flow constraints; the state transfer probability distribution of the intelligent agent is P(s′|s,a), which means that after the intelligent agent takes action a t according to the current state S t , the probability of the environment transferring from S t to S′ t under the action of action a t .

进一步地,前述的步骤S2包括以下子步骤:Furthermore, the aforementioned step S2 includes the following sub-steps:

S201、基于Actor网络构建每个智能体的每个智能体的行动者网络,每个智能体的行动者网络的策略如下式:S201. Construct an actor network of each agent based on the Actor network. The strategy of the actor network of each agent is as follows:

其中,为每个智能体在特定时间点t采取的行动,由Actor网络决定;i代表智能体的索引,智能体i在时间t的状态向量表示为每个智能体的策略记为是基于压缩高斯分布的策略;in, For each agent’s action at a specific time point t, Determined by the Actor network; i represents the index of the agent, and the state vector of agent i at time t is expressed as The strategy of each agent is denoted as It is a strategy based on compressed Gaussian distribution;

S202、每个智能体基于最大化预期回报与策略的熵迭代更新,联合策略π(at|st)的熵H(π),如下式:S202. Each agent iteratively updates the entropy of the joint strategy π(a t |s t ) based on maximizing the expected return and the strategy, and the entropy H(π) is as follows:

式中,H(πi)为各局部策略的熵,代表策略的随机性,是系统中不确定性的量化;N是智能体的个数;Where H(π i ) is the entropy of each local strategy, representing the randomness of the strategy and quantifying the uncertainty in the system; N is the number of agents;

S203、在策略评估阶段,对Critic网络参数θ进行训练,减少Bellman残差:S203, in the strategy evaluation stage, train the Critic network parameter θ to reduce the Bellman residual:

JQ(θ)是Critic网络参数θ的目标函数,它通过最小化该函数来训练网络参数;表示对当前策略产生的状态-动作对的期望,是在当前状态st和动作at的分布下计算的,D是经验回放缓冲区,它存储了先前的用于训练;Q(st,at)代表动作值函数,γ是折扣因子,用于计算未来奖励的现值,它的值介于0和1之间;r(st,at)是在状态st下采取动作at所获得的即时奖励;Vθ是由参数θ参数化的价值函数网络对下一个状态st+1的价值估计;α表示温度参数是熵正则化系数,它权衡了奖励和熵之间的关系,以鼓励探索。J Q (θ) is the objective function of the Critic network parameter θ, which trains the network parameters by minimizing this function; represents the expectation of the state-action pairs produced by the current policy, which is calculated under the distribution of the current state s t and action a t , D is the experience replay buffer, which stores the previous ones for training; Q(s t , a t ) represents the action value function, γ is the discount factor used to calculate the present value of future rewards, and its value is between 0 and 1; r(s t , a t ) is the immediate reward obtained by taking action a t in state s t ; V θ is the value estimate of the value function network parameterized by parameter θ for the next state s t+1 ; α represents the temperature parameter, which is the entropy regularization coefficient, which weighs the relationship between reward and entropy to encourage exploration.

利用随机策略梯度对Critic网络的参数进行优化,如下式:The parameters of the Critic network are optimized using stochastic policy gradient as follows:

式中:Where:

其中,r为即时奖励值,φi为每个智能体的策略参数,Among them, r is the instant reward value, φ i is the strategy parameter of each agent,

S204、策略制定阶段,Actor网络目标函数如下式:S204, strategy formulation phase, the Actor network objective function is as follows:

式中,代表最佳联合策略,Q(st,at)代表动作值函数,α表示温度参数;π′是目标策略;In the formula, represents the optimal joint strategy, Q(s t ,a t ) represents the action value function, α represents the temperature parameter; π′ is the target strategy;

S205、每个智能体的策略通过最小化其行动者网络产生的动作的预期熵进行训练,如下式所示:S205. The strategy of each agent is trained by minimizing the expected entropy of the actions produced by its actor network, as shown in the following formula:

采用随机梯度下降法更新每个智能体的策略参数φi,α更新如下式:The stochastic gradient descent method is used to update the policy parameters φ i of each agent, and α is updated as follows:

式中,H'为目标熵,目标熵是由超参数组成的等效向量;Where H' is the target entropy, which is an equivalent vector composed of hyperparameters;

相较于现有技术,本发明采用以上技术方案的有益技术效果如下:Compared with the prior art, the beneficial technical effects of the present invention using the above technical solution are as follows:

1、降低通信需求与计算负担:本发明的多智能体基于深度强化学习方法能够以去中心化的方式执行,显著减少了智能体网络间的通信需求。特别是在包含大量分布式能源资源的复杂电力系统中,这一优点减轻了集中式方法带来的计算负担,从而提高了系统的整体效率和可靠性。1. Reduce communication requirements and computational burden: The multi-agent deep reinforcement learning method of the present invention can be executed in a decentralized manner, significantly reducing the communication requirements between agent networks. Especially in complex power systems containing a large number of distributed energy resources, this advantage reduces the computational burden brought by centralized methods, thereby improving the overall efficiency and reliability of the system.

2、改善电网电压稳定性:通过协调控制光伏逆变器的无功功率设定点,本发明有效改善配电网络的电压曲线,提升电网的电压稳定性。这对于应对太阳能发电的不稳定性和波动性尤为重要。2. Improve grid voltage stability: By coordinating and controlling the reactive power set point of the photovoltaic inverter, the present invention effectively improves the voltage curve of the distribution network and enhances the voltage stability of the grid. This is particularly important for dealing with the instability and volatility of solar power generation.

3、适用性广泛且灵活:本发明不依赖于系统建模,使其能够灵活应用于各种不同的配电网络配置中,无需对系统拓扑或参数进行详细了解。这增加了方法的适用范围,尤其是对于那些难以获取准确系统数据的配电网络。3. Wide applicability and flexibility: The present invention does not rely on system modeling, which enables it to be flexibly applied to a variety of distribution network configurations without the need for detailed understanding of system topology or parameters. This increases the scope of application of the method, especially for distribution networks where it is difficult to obtain accurate system data.

4、强化学习算法的优化:所开发的MASAC算法具有强大的探索能力,能够有效地为智能体寻找最佳行动方案。与传统基于最大熵的软Q学习方法相比,本发明避免了潜在的复杂性和不稳定性问题,增强了算法的稳定性和可靠性。4. Optimization of reinforcement learning algorithm: The developed MASAC algorithm has a strong exploration capability and can effectively find the best action plan for the intelligent agent. Compared with the traditional maximum entropy-based soft Q learning method, this invention avoids the potential complexity and instability problems and enhances the stability and reliability of the algorithm.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明的方法流程图。FIG. 1 is a flow chart of the method of the present invention.

图2为一个实施例提供的MASAC算法的训练结果示意图。FIG. 2 is a schematic diagram of training results of a MASAC algorithm provided by an embodiment.

图3为一个实施例提供的MASAC算法的测试结果示意图。FIG3 is a schematic diagram of test results of a MASAC algorithm provided by an embodiment.

图4为一个实施例提供的无功电压控制鲁棒性测试结果示意图。FIG4 is a schematic diagram of reactive power voltage control robustness test results provided by an embodiment.

图5为本发明一个实施例提供的无功电压控制所有节点电压效果示意图。FIG5 is a schematic diagram showing the effect of reactive voltage control on all node voltages provided by an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

为了更了解本发明的技术内容,特举具体实施例并配合所附图式说明如下。In order to better understand the technical content of the present invention, specific embodiments are given and described as follows in conjunction with the accompanying drawings.

在本发明中参照附图来描述本发明的各方面,附图中示出了许多说明性实施例。本发明的实施例不局限于附图所述。应当理解,本发明通过上面介绍的多种构思和实施例,以及下面详细描述的构思和实施方式中的任意一种来实现,这是因为本发明所公开的构思和实施例并不限于任何实施方式。另外,本发明公开的一些方面可以单独使用,或者与本发明公开的其他方面的任何适当组合来使用。Various aspects of the invention are described herein with reference to the accompanying drawings, in which many illustrative embodiments are shown. The embodiments of the invention are not limited to those described in the accompanying drawings. It should be understood that the invention is implemented by any of the various concepts and embodiments described above, as well as the concepts and embodiments described in detail below, because the concepts and embodiments disclosed in the invention are not limited to any implementation. In addition, some aspects disclosed in the invention may be used alone or in any appropriate combination with other aspects disclosed in the invention.

如图1所示,本发明提供一种基于MASAC算法的配电网分布式光伏电压无功控制方法,包括如下步骤:As shown in FIG1 , the present invention provides a distributed photovoltaic voltage reactive power control method for a distribution network based on a MASAC algorithm, comprising the following steps:

S1、构建记及分布式光伏的配电网电压无功去中心化控制框架,将配电网电压无功去中心化控制问题转化为马尔科夫博弈模型;S1. Construct a decentralized control framework for voltage and reactive power of distribution network with distributed photovoltaics, and transform the problem of decentralized control of voltage and reactive power of distribution network into a Markov game model.

配电网电压无功去中心化控制框架包括:以最小化一段时间内配电网的有功功率损耗为目标、以分布式的光伏逆变器为决策变量、以及以预设电压范围为约束条件;The framework of decentralized control of voltage and reactive power in distribution network includes: minimizing the active power loss of distribution network over a period of time as the goal, using distributed photovoltaic inverters as decision variables, and using a preset voltage range as a constraint;

马尔科夫博弈模型包括:状态空间:各个智能体所包括的光伏逆变器的有功/无功功率的净注入量和光伏逆变器电压幅值构成的局部观测值集合,动作空间:所有智能体控制的光伏逆变器的无功输出量构成的动作集合;奖励函数:由有功损耗成本、以及电压越限惩罚构建;状态转移过程:智能体的状态遵循配电网的潮流计算约束并且根据状态转移概率分布进行更新。The Markov game model includes: state space: a set of local observation values consisting of the net injection of active/reactive power of the photovoltaic inverters included in each intelligent agent and the voltage amplitude of the photovoltaic inverter; action space: a set of actions consisting of the reactive output of the photovoltaic inverters controlled by all intelligent agents; reward function: constructed by the active loss cost and the voltage over-limit penalty; state transfer process: the state of the intelligent agent follows the power flow calculation constraints of the distribution network and is updated according to the state transfer probability distribution.

(1)目标函数:电压无功控制的目标是在确保节点电压不越限的情况下最小化一段时间内配电网的有功功率损耗,目标函数如下式:(1) Objective function: The goal of voltage and reactive power control is to minimize the active power loss of the distribution network within a period of time while ensuring that the node voltage does not exceed the limit. The objective function is as follows:

其中,T表示优化时间段,Ploss(t)表示时刻t的有功网络损耗。Wherein, T represents the optimization time period, and P loss (t) represents the active network loss at time t.

(2)决策变量:配电网的电压无功去中心化控制对象为分布式的PV逆变器,通过调节分布式PV逆变器的无功功率输出QPV来完成对配电网的去中心化控制。(2) Decision variables: The voltage and reactive power decentralized control object of the distribution network is the distributed PV inverter. The decentralized control of the distribution network is achieved by adjusting the reactive power output Q PV of the distributed PV inverter.

(3)约束条件:配电网的电压无功去中心化控制最主要的是节点电压约束在控制的过程中要确保节点电压在规定的限制范围V内。(3) Constraints: The most important factor in the decentralized control of voltage and reactive power in distribution networks is the node voltage constraint. During the control process, it is necessary to ensure that the node voltage is within the specified limit range V and Inside.

(4)状态空间:状态空间S表示所有智能体在t时刻的状态,而si,t表示智能体i在t时刻的局部观测值,si,t=(pi,qi,vi),这里pi,qi和vi分别代表智能体i所在局部节点的有功/无功功率的净注入量和节点电压幅值,S为所有节点状态的集合。(4) State space: The state space S represents the state of all agents at time t, and si ,t represents the local observation value of agent i at time t, si,t = ( pi , qi , vi ), where pi , qi and vi represent the net injection of active/reactive power and the node voltage amplitude of the local node where agent i is located, respectively, and S is the set of all node states.

(5)动作空间:动作空间A表示在时间t时所有智能体的动作集合,ai,t表示智能体i在时间t时刻所控制的PV逆变器的无功输出量QPV,i,t(5) Action space: The action space A represents the set of actions of all agents at time t, and a i,t represents the reactive output Q PV,i,t of the PV inverter controlled by agent i at time t.

(6)奖励函数:奖励函数方程式来衡量智能体所做动作的,所有智能体共享相同的奖励函数,该方程式由有功损耗成本和电压越限惩罚两部分组成,如下:(6) Reward function: The reward function equation is used to measure the actions taken by the agent. All agents share the same reward function, which consists of two parts: active power loss cost and voltage limit penalty, as follows:

上式中,R(t)为t时刻的奖励值,Ploss(t)为t时刻的有功损耗,函数为0-1判别函数,当k节点的电压Vk(t)满足上下限V,时函数f为0,否则为1,电压幅值上下限被设置为1.05和0.95,σ1为单位有功损耗成本,σ2为电压越限惩罚因子。In the above formula, R(t) is the reward value at time t, P loss (t) is the active power loss at time t, and the function is a 0-1 discriminant function. When the voltage V k (t) of the k node satisfies the upper and lower limits V , When the function f is 0, otherwise it is 1. The upper and lower limits of the voltage amplitude are set to 1.05 and 0.95, σ1 is the unit active power loss cost, and σ2 is the voltage limit penalty factor.

(7)状态转移过程:智能体的动作送入环境后的状态转移过程要严格遵循配电网的潮流计算约束,包含功率平衡约束和潮流约束,本发明使用PYPOWER潮流计算工具来搭建配电网的环境,同时使用runpf函数进行潮流计算,该函数在计算潮流时会自动满足功率平衡约束和潮流约束,智能体的状态转移概率分布由P(s′|s,a)表示,代表智能体根据当前状态St采取动作at后,环境在动作at作用下,由St转移至S′t的概率。(7) State transfer process: The state transfer process after the action of the intelligent agent is sent to the environment must strictly follow the power flow calculation constraints of the distribution network, including power balance constraints and power flow constraints. The present invention uses the PYPOWER power flow calculation tool to build the distribution network environment, and uses the runpf function to perform power flow calculation. This function will automatically meet the power balance constraints and power flow constraints when calculating the power flow. The state transfer probability distribution of the intelligent agent is represented by P(s′|s,a), which represents the probability that the environment will transfer from S t to S′ t under the action of action a t after the intelligent agent takes action a t according to the current state S t.

S2、构建MASAC算法求解马尔科夫博弈模型,为每个智能体构建Actor和Critic神经网络,Actor神经网络决定智能体的策略,Critic神经网络用于判定策略的价值;S2. Construct the MASAC algorithm to solve the Markov game model, and construct Actor and Critic neural networks for each agent. The Actor neural network determines the strategy of the agent, and the Critic neural network is used to determine the value of the strategy.

采用集中式训练的方式神经网络进行训练,得到配电网分布式光伏电压无功控制模型,然后利用该模型实现配电网电压的在线控制,完成配电网电压的在线控制,实现光伏逆变器的去中心化调控。The neural network is trained in a centralized training manner to obtain a distributed photovoltaic voltage reactive control model for the distribution network. This model is then used to realize online control of the distribution network voltage and achieve decentralized regulation of the photovoltaic inverter.

本发明中,多智能体MASAC的主要创新之处在于其集中式训练与去中心化执行的过程。在训练阶段,Critic网络通过使用全局信息进行集中式训练,而在执行阶段,每个智能体则利用其本地观测作为个别输入,以去中心化的方式制定自己的控制策略,即使用压缩高斯分布函数来生成连续动作。本发明提出的方法中,策略被训练为最大化熵与预期回报之间的权衡。这有助于避免过早收敛的问题,这对于实现全局最优是必需的。MASAC框架中每个智能体的行动者网络的策略可按以下方式表示:In the present invention, the main innovation of multi-agent MASAC lies in its centralized training and decentralized execution process. In the training phase, the Critic network is centrally trained by using global information, while in the execution phase, each agent uses its local observations as individual inputs to formulate its own control strategy in a decentralized manner, that is, using a compressed Gaussian distribution function to generate continuous actions. In the method proposed in the present invention, the strategy is trained as a trade-off between maximizing entropy and expected returns. This helps to avoid the problem of premature convergence, which is necessary to achieve global optimality. The strategy of the actor network of each agent in the MASAC framework can be expressed as follows:

其中,为每个智能体在特定时间点t采取的行动,由Actor网络决定;i代表智能体的索引,智能体i在时间t的状态向量表示为每个智能体的策略记为是基于压缩高斯分布的策略。in, For each agent’s action at a specific time point t, Determined by the Actor network; i represents the index of the agent, and the state vector of agent i at time t is expressed as The strategy of each agent is denoted as It is a strategy based on compressed Gaussian distribution.

每个智能体都拥有其独立的策略,该策略在每次迭代中更新,以最大化预期回报与策略的熵之间的权衡。策略的熵代表策略的随机性,是系统中不确定性的量化。联合策略π(at|st)的熵H(π)可以按照以下方式表达:Each agent has its own independent policy, which is updated in each iteration to maximize the trade-off between expected reward and the entropy of the policy. The entropy of the policy represents the randomness of the policy and is a quantification of the uncertainty in the system. The entropy H(π) of the joint policy π(a t |s t ) can be expressed as follows:

式中,H(πi)为各局部策略的熵,N是智能体的个数;Where H(π i ) is the entropy of each local strategy, and N is the number of agents;

在策略评估阶段,对Critic网络参数θ进行训练,使其减少Bellman残差:In the strategy evaluation phase, the Critic network parameters θ are trained to reduce the Bellman residual:

式中,JQ(θ)是Critic网络参数θ的目标函数,它通过最小化该函数来训练网络参数;表示对当前策略产生的状态-动作对的期望,是在当前状态st和动作at的分布下计算的,D是经验回放缓冲区,它存储了先前的经验(状态、动作、奖励等)来用于训练;Q(st,at)代表动作值函数,γ是折扣因子,用于计算未来奖励的现值,它的值介于0和1之间;r(st,at)是在状态st下采取动作at所获得的即时奖励;Vθ是由参数θ参数化的价值函数网络对下一个状态st+1的价值估计;α表示温度参数是熵正则化系数,它权衡了奖励和熵之间的关系,以鼓励探索。Where J Q (θ) is the objective function of the Critic network parameter θ, which trains the network parameters by minimizing this function; represents the expectation of the state-action pairs produced by the current policy, which is calculated under the distribution of the current state s t and action a t . D is the experience replay buffer, which stores previous experience (state, action, reward, etc.) for training; Q(s t , a t ) represents the action value function, γ is the discount factor used to calculate the present value of future rewards, and its value is between 0 and 1; r(s t , a t ) is the immediate reward obtained by taking action a t in state s t ; V θ is the value estimate of the next state s t+1 by the value function network parameterized by parameter θ; α represents the temperature parameter, which is the entropy regularization coefficient, which weighs the relationship between reward and entropy to encourage exploration.

在优化过程中,利用随机策略梯度对Critic网络的参数进行优化:During the optimization process, the stochastic policy gradient is used to optimize the parameters of the Critic network:

式中:Where:

式中,r为即时奖励值,φi为每个智能体的策略参数,In the formula, r is the instant reward value, φ i is the strategy parameter of each agent,

在策略制定阶段,MASAC算法的Actor网络目标可以表示为:In the strategy formulation phase, the Actor network goal of the MASAC algorithm can be expressed as:

式中,π′是目标策略,代表最佳联合策略,Q(st,at)代表动作值函数,α表示温度参数,每个智能体的策略参数化为φi,旨在通过训练来降低预期熵。Where π′ is the target strategy, represents the optimal joint strategy, Q(s t ,a t ) represents the action-value function, α represents the temperature parameter, and the strategy of each agent is parameterized as φ i , which aims to reduce the expected entropy through training.

具体来说,每个智能体的策略通过以下目标进行训练:最小化其行动者网络产生的动作的预期熵,如下式所示:Specifically, each agent’s policy is trained with the objective of minimizing the expected entropy of actions produced by its actor network, as shown below:

采用随机梯度下降法更新每个智能体的策略参数φi,最后,α可以用下式来更新:The stochastic gradient descent method is used to update the policy parameters φ i of each agent. Finally, α can be updated as follows:

式中H'为目标熵,目标熵是由超参数组成的等效向量。针对所有智能体训练Actor和Critic神经网络,并在目标函数中考虑Q函数的最小值,以最小化对状态值的高估。Where H' is the target entropy, which is an equivalent vector composed of hyperparameters. Actor and Critic neural networks are trained for all agents, and the minimum value of the Q function is considered in the objective function to minimize the overestimation of the state value.

本发明提出的MASAC方法旨在优化智能体的无功功率输出,以调节配电网节点的电压,将每个PV看做一个智能体进行控制。在训练阶段,每个智能体的动作被提供给集中式Critic网络,以计算奖励并将其发送给智能体用于策略更新,其离线训练流程如下表所示:The MASAC method proposed in this paper aims to optimize the reactive power output of the agent to adjust the voltage of the distribution network node, and treats each PV as an agent for control. In the training phase, the action of each agent is provided to the centralized Critic network to calculate the reward and send it to the agent for strategy update. The offline training process is shown in the following table:

步骤4:部署步骤3中训练好的强化学习智能体,采用分布式执行的方式完成配电网电压的在线控制,实现各PV的去中心化调控;Step 4: Deploy the reinforcement learning agent trained in step 3, and use distributed execution to complete the online control of the distribution network voltage, thereby achieving decentralized control of each PV.

当智能体训练得当后,它们仅以局部状态作为输入并在不与集中控制器通信的情况下做出行动,下表为去中心化的分布式在线执行流程:When the agents are properly trained, they only take local states as input and take actions without communicating with a centralized controller. The following table shows the decentralized distributed online execution process:

为了评估本发明提出的电压控制框架的性能,进行了在修改后的IEEE 34母线测试系统上的仿真实验。在不同节点上添加了十二个聚合光伏逆变器,总发电容量为1576kW。由于最大负载需求为1756kW,最大太阳能光伏发电量约占总峰值负载的90%。去中心化智能体的性能在与训练数据集不同的负载和光伏曲线下进行了测试。此外,控制逆变器的无功功率以确保其运行功率因数不低于制造商推荐的0.9。作为电力流求解器,使用了PYPOWER,并与Python接口相连,作为学习和测试环境。In order to evaluate the performance of the proposed voltage control framework, simulation experiments on a modified IEEE 34 bus test system were conducted. Twelve aggregated PV inverters were added at different nodes with a total power generation capacity of 1576kW. Since the maximum load demand was 1756kW, the maximum solar PV power generation accounted for about 90% of the total peak load. The performance of the decentralized agent was tested under different load and PV curves than the training dataset. In addition, the reactive power of the inverter was controlled to ensure that its operating power factor was not less than 0.9 recommended by the manufacturer. As a power flow solver, PYPOWER was used, connected with a Python interface as a learning and testing environment.

对各智能体进行500集的训练,学习最优控制策略,以找到应对电压违规场景的最优行为。Actor网络和Critic网络都由全连接的神经网络组成,全连接的神经网络由输入层、输出层和隐藏层组成,其参数如下表所示。Each agent is trained for 500 episodes to learn the optimal control strategy to find the optimal behavior to deal with the voltage violation scenario. Both the Actor network and the Critic network are composed of a fully connected neural network, which consists of an input layer, an output layer, and a hidden layer. Its parameters are shown in the following table.

在训练的初级阶段,个体随机探索环境的决策空间,最终如图2所示,在特定事件发生后收敛并提供最优行动。训练阶段结束后,每个被训练的智能体只需要其局部状态来提供解决调压问题的最优动作。本发明通过这些仿真实验验证了去中心化智能体在不同负载和光伏曲线条件下的有效性和适应性。In the initial stage of training, individuals randomly explore the decision space of the environment, and eventually converge and provide the optimal action after a specific event occurs, as shown in Figure 2. After the training stage, each trained agent only needs its local state to provide the optimal action to solve the voltage regulation problem. Through these simulation experiments, the present invention verifies the effectiveness and adaptability of decentralized agents under different load and photovoltaic curve conditions.

图3描述了训练模型和基本情况下测试系统某节点处的电压波动情况。可以观察到,在基本情况场景中,根据电压标准限值存在电压违规,而在提出的MASAC算法的控制方式下没有电压违规。此外,在基本情况下,电压的变化比所提出的训练模型要大。Figure 3 depicts the voltage fluctuation at a node of the test system under the training model and the base case. It can be observed that in the base case scenario, there is a voltage violation according to the voltage standard limit, while there is no voltage violation under the control of the proposed MASAC algorithm. In addition, in the base case, the voltage variation is larger than that of the proposed training model.

图4显示了训练模型和基本场景下所测试节点的电压波动情况。结果表明,所提出的方法具有较好的性能。最后,图5描述了所有34个节点在第20分钟电压的变化。结果表明,所提出的训练模型在电压变化和违规方面比基本情况有更好的电压分布。Figure 4 shows the voltage fluctuations of the tested nodes under the trained model and the basic scenario. The results show that the proposed method has a better performance. Finally, Figure 5 describes the voltage changes of all 34 nodes at the 20th minute. The results show that the proposed trained model has a better voltage distribution in terms of voltage changes and violations than the basic case.

虽然本发明已以较佳实施例阐述如上,然其并非用以限定本发明。本发明所属技术领域中具有通常知识者,在不脱离本发明的精神和范围内,当可作各种的更动与润饰。因此,本发明的保护范围当视权利要求书所界定者为准。Although the present invention has been described above with preferred embodiments, it is not intended to limit the present invention. A person skilled in the art of the present invention may make various modifications and improvements without departing from the spirit and scope of the present invention. Therefore, the scope of protection of the present invention shall be determined by the definition of the claims.

Claims (7)

1. A power distribution network distributed photovoltaic voltage reactive power control method based on MASAC algorithm is characterized by comprising the following steps:
S1, constructing a voltage reactive power decentralization control framework of a power distribution network of a record and distributed photovoltaic, and converting the voltage reactive power decentralization control problem of the power distribution network into a Markov game model;
the reactive power decentralization control frame of distribution network voltage includes: the method comprises the steps of taking active power loss of a power distribution network in a period of time as a target, taking a distributed photovoltaic inverter as a decision variable and taking a preset voltage range as a constraint condition;
The Markov game model comprises: state space: and the intelligent agent comprises a local observation value set formed by the net injection quantity of active/reactive power of the photovoltaic inverter and the voltage amplitude of the photovoltaic inverter, and an action space: an action set formed by reactive output quantity of all the photovoltaic inverters controlled by the intelligent agent; bonus function: constructed from active loss cost, and voltage out-of-limit penalty; state transition process: the state of the intelligent agent follows the load flow calculation constraint of the power distribution network and is updated according to the state transition probability distribution;
S2, constructing MASAC algorithm to solve a Markov game model, constructing an Actor and a Critic neural network for each agent, wherein the Actor neural network determines the strategy of the agent, and the Critic neural network is used for judging the value of the strategy;
training is carried out by adopting a neural network in a centralized training mode to obtain a distributed photovoltaic voltage reactive control model of the power distribution network, then the model is utilized to realize the on-line control of the power distribution network voltage, the on-line control of the power distribution network voltage is completed, and the decentralization regulation and control of the photovoltaic inverter are realized.
2. The distributed photovoltaic voltage reactive power control method of a power distribution network based on MASAC algorithm as claimed in claim 1, wherein in step S1, the objective of minimizing active power loss of the power distribution network in a period of time is to construct an objective function as follows:
Where T represents the optimization time period and P loss (T) represents the active network loss at time T.
3. The distributed photovoltaic voltage reactive power control method of a power distribution network based on MASAC algorithm as claimed in claim 1, wherein in step S1, the constraint condition is:
Wherein V k (t) is the voltage of the photovoltaic inverter k at time t, V and Respectively a preset lower voltage limit and an upper voltage limit.
4. The method for distributed photovoltaic voltage reactive power control of a power distribution network based on MASAC algorithm as claimed in claim 1, wherein in step S1, the state space S is a set of local observations S i,t of all photovoltaic inverters at time t, S i,t is a local observation of an agent i at time t, and S i,t=(pi,qi,vi),pi,qi and v i respectively represent the net injection amount and the node voltage amplitude of active/reactive power of the photovoltaic inverter in which the agent i is located;
The action space a is a set of actions a i,t composed of the reactive output of the photovoltaic inverter controlled by all agents at time t, and Q PV,i,t is the reactive output of the PV inverter controlled by agent i at time t.
5. The distributed photovoltaic voltage reactive power control method of a power distribution network based on MASAC algorithm as claimed in claim 1, wherein in step S1, the reward function is as follows:
Wherein R (t) is a reward value at time t, P loss (t) is an active loss at time t, a function Is a 0-1 discriminant function, when the voltage V k (t) of the k node meets the upper limit V and the lower limit V,The time function f is 0, otherwise is 1, sigma 1 is the unit active loss cost, and sigma 2 is the voltage out-of-limit penalty factor.
6. The method for reactive power control of power distribution network distributed photovoltaic voltage based on MASAC algorithm according to claim 1, wherein in step S1, the state transition process uses PYPOWER power flow calculation tool to build the environment of the power distribution network, and runpf function is used to perform power flow calculation, and the power flow calculation constraint comprises power balance constraint and power flow constraint; the state transition probability distribution of the agent is P (S '|s, a), which indicates the probability that the environment transitions from S t to S' t under the action of action a t after the agent takes action a t according to the current state S t.
7. A distributed photovoltaic voltage reactive control method of a power distribution network based on MASAC algorithm as claimed in claim 1, wherein step S2 comprises the following sub-steps:
S201, constructing an Actor network of each intelligent agent based on the Actor network, wherein the strategy of the Actor network of each intelligent agent is as follows:
wherein, For each action taken by the agent at a particular point in time t,Determining by an Actor network; i represents the index of agent, and the state vector of agent i at time t is expressed asThe policy of each agent is noted asIs a strategy based on compressed gaussian distribution;
S202, each agent combines entropy H (pi) of a strategy pi (a t|st) based on the entropy iteration update of the maximum expected return and the strategy, and the following formula is shown in the specification:
Wherein H (pi i) is entropy of each local strategy, represents randomness of the strategy and is the quantification of uncertainty in the system; n is the number of the intelligent agents;
S203, training Critic network parameters theta in a strategy evaluation stage, and reducing Bellman residual errors:
j Q (θ) is an objective function of Critic network parameter θ that trains network parameters by minimizing the function; representing the desire for state-action pairs generated by the current strategy, calculated under the distribution of current state s t and action a t, D is an empirical playback buffer that stores previous training; q (s t,at) represents an action value function, gamma is a discount factor, and is used for calculating the present value of future rewards, and the value of the present value is between 0 and 1; r (s t,at) is the immediate prize obtained by taking action a t in state s t; v θ is the value estimate of the next state s t+1 by the network of cost functions parameterized by the parameter θ; alpha represents that the temperature parameter is an entropy regularization coefficient that balances the relationship between rewards and entropy to encourage exploration;
parameters of the Critic network are optimized by using random strategy gradients, and the following formula is adopted:
Wherein:
Wherein r is an instant rewarding value, phi i is a policy parameter of each agent,
S204, in a strategy making stage, an Actor network objective function is expressed as follows:
In the method, in the process of the invention, Representing the optimal combination strategy, Q (s t,at) representing an action value function, and alpha representing a temperature parameter; pi' is the target policy;
S205, the policy of each agent is trained by minimizing the expected entropy of actions generated by its actor network, as shown in the following formula:
And updating the strategy parameters phi i of each agent by adopting a random gradient descent method, wherein alpha is updated as follows:
Wherein H' is a target entropy, and the target entropy is an equivalent vector consisting of super parameters;
s206, training an Actor and Critic neural network for all agents, and taking the minimum value of the Q function in the objective function.
CN202410467035.XA 2024-04-18 2024-04-18 MASAC algorithm-based distributed photovoltaic voltage reactive power control method for power distribution network Pending CN118554555A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410467035.XA CN118554555A (en) 2024-04-18 2024-04-18 MASAC algorithm-based distributed photovoltaic voltage reactive power control method for power distribution network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410467035.XA CN118554555A (en) 2024-04-18 2024-04-18 MASAC algorithm-based distributed photovoltaic voltage reactive power control method for power distribution network

Publications (1)

Publication Number Publication Date
CN118554555A true CN118554555A (en) 2024-08-27

Family

ID=92448800

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410467035.XA Pending CN118554555A (en) 2024-04-18 2024-04-18 MASAC algorithm-based distributed photovoltaic voltage reactive power control method for power distribution network

Country Status (1)

Country Link
CN (1) CN118554555A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120582137A (en) * 2025-08-01 2025-09-02 南京辉强新能源科技有限公司 A voltage control method for distribution networks based on multi-agent deep reinforcement learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113872213A (en) * 2021-09-09 2021-12-31 国电南瑞南京控制系统有限公司 Power distribution network voltage autonomous optimization control method and device
CN115483703A (en) * 2022-09-14 2022-12-16 云南电网有限责任公司昆明供电局 Multi-region collaborative reactive power optimization method for distribution network based on multi-agent reinforcement learning
CN117200213A (en) * 2023-09-13 2023-12-08 浙江工业大学 Distribution system voltage control method based on deep reinforcement learning of self-organizing map neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113872213A (en) * 2021-09-09 2021-12-31 国电南瑞南京控制系统有限公司 Power distribution network voltage autonomous optimization control method and device
CN115483703A (en) * 2022-09-14 2022-12-16 云南电网有限责任公司昆明供电局 Multi-region collaborative reactive power optimization method for distribution network based on multi-agent reinforcement learning
CN117200213A (en) * 2023-09-13 2023-12-08 浙江工业大学 Distribution system voltage control method based on deep reinforcement learning of self-organizing map neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HAOTIAN LIU等: "Online Multi-Agent Reinforcement Learning for Decentralized Inverter-Based Volt-VAR Control", 《IEEE TRANSACTIONS ON SMART GRID》, vol. 12, no. 04, 31 July 2021 (2021-07-31), pages 2980 - 2990 *
巨云涛 等: "基于分布式深度强化学习的微网群有功无功协调优化调度", 《电力系统自动化》, vol. 47, no. 01, 10 January 2023 (2023-01-10), pages 115 - 125 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120582137A (en) * 2025-08-01 2025-09-02 南京辉强新能源科技有限公司 A voltage control method for distribution networks based on multi-agent deep reinforcement learning

Similar Documents

Publication Publication Date Title
CN114217524B (en) Power grid real-time self-adaptive decision-making method based on deep reinforcement learning
CN110535146B (en) Electric power system reactive power optimization method based on depth determination strategy gradient reinforcement learning
CN114725936A (en) Distribution network optimization method based on multi-agent deep reinforcement learning
CN113098007B (en) Distributed online scheduling method and system for microgrid based on hierarchical reinforcement learning
CN117350423A (en) A cluster collaborative optimization method for distributed energy systems based on multi-agent reinforcement learning
CN114362187B (en) A method and system for cooperative voltage regulation of active distribution network based on multi-agent deep reinforcement learning
CN113962446B (en) Microgrid group collaborative scheduling method, device, electronic device and storage medium
CN114696351B (en) A method, device, electronic device and storage medium for dynamic optimization of battery energy storage system
CN116523327A (en) Method and equipment for intelligently generating operation strategy of power distribution network based on reinforcement learning
CN117172097A (en) Power distribution network dispatching operation method based on cloud edge cooperation and multi-agent deep learning
CN112202206A (en) Multi-energy micro-grid distributed scheduling method based on potential game
CN117039981A (en) Large-scale power grid optimal scheduling method, device and storage medium for new energy
CN117993647A (en) Optimization method and terminal of optical storage charging station system based on reinforcement learning
CN117674114A (en) Dynamic economic scheduling method and system for power distribution network
CN117937599A (en) Multi-agent reinforcement learning distribution network optimization method for distributed photovoltaic consumption
CN117893043A (en) Hydropower station load distribution method based on DDPG algorithm and deep learning model
CN118554555A (en) MASAC algorithm-based distributed photovoltaic voltage reactive power control method for power distribution network
Liao et al. A distributed deep reinforcement learning approach for reactive power optimization of distribution networks
CN115345380A (en) A new energy consumption power dispatching method based on artificial intelligence
CN119539988A (en) An energy management optimization method based on adaptive exploration deep Q network
CN114298429A (en) Power distribution network scheme aided decision-making method, system, device and storage medium
CN117650553B (en) Multi-agent deep reinforcement learning-based 5G base station energy storage battery charge and discharge scheduling method
CN119209493A (en) Dynamic topology reconstruction method and system for distribution network containing high penetration photovoltaic power generation
CN118868020A (en) A low-carbon economic dispatch method considering cost constraints
CN118671629A (en) CSAPSO-improved DNN algorithm-based energy storage power station battery state of health evaluation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载