CN117578495A

CN117578495A - Distributed photovoltaic multi-agent cluster voltage control method and system

Info

Publication number: CN117578495A
Application number: CN202311595249.7A
Authority: CN
Inventors: 张小庆; 秋泽楷; 王建波; 李文波; 张志华; 雷妤航; 豆敏娜; 王毅钊; 范斌涛; 李斌; 常小强; 吕锡林; 邵美阳; 王露缙; 王俪蓉
Original assignee: National Network Xi'an Environmental Protection Technology Center Co ltd; Electric Power Research Institute of State Grid Shaanxi Electric Power Co Ltd
Current assignee: National Network Xi'an Environmental Protection Technology Center Co ltd; Electric Power Research Institute of State Grid Shaanxi Electric Power Co Ltd
Priority date: 2023-11-27
Filing date: 2023-11-27
Publication date: 2024-02-20

Abstract

The invention discloses a distributed photovoltaic multi-main-body cluster voltage regulation and control method and system, which construct a multi-agent collaborative optimization structure of a deep reinforcement learning framework based on a DSO and a system framework of a multi-main-body photovoltaic user; calculating the voltage support and the photovoltaic utilization rate based on the multi-agent cooperative optimization structure; comprehensively representing the voltage regulation and digestion performances of the photovoltaic by using two indexes of voltage support and photovoltaic utilization rate, and constructing a photovoltaic grading model; based on the photovoltaic grading model, establishing an upper-layer and lower-layer collaborative optimization model between DSO and a photovoltaic user; and establishing a reactive voltage control optimization model based on cooperation of a distribution network aggregator and a distributed photovoltaic user based on the upper and lower layer cooperation optimization model, and solving to obtain an optimal voltage regulation strategy with maximized upper and lower layer cooperation benefit. The invention can enable the energy utilization strategy of the photovoltaic user to dynamically track the change of the network topology, and is beneficial to improving the solving speed and improving the convergence of solving.

Description

Distributed photovoltaic multi-agent cluster voltage control method and system

技术领域Technical field

本发明属于配电网在分布式资源接入下的调压控制技术领域，特别涉及一种分布式光伏多主体集群电压调控方法及系统。The invention belongs to the technical field of voltage regulation control of distribution networks under distributed resource access, and particularly relates to a distributed photovoltaic multi-subject cluster voltage regulation method and system.

背景技术Background technique

配电网用户侧的分布式光伏能源作为一种分布式资源，接入电网的比例正在不断增加。随着规模化分布式能源的接入，全网电压分布情况发生改变，严重情况下可能会导致电压越限故障，因此需对配网侧进行调压操作。目前分布式光伏除了作为一种新能源主体参与调节负荷侧的容量外，还可以作为一种新型调压主体，通过其并网换流器参与配电网的调压。As a distributed resource, the proportion of distributed photovoltaic energy connected to the power grid on the user side of the distribution network is increasing. With the access of large-scale distributed energy resources, the voltage distribution of the entire network changes, which may lead to voltage over-limit faults in serious cases. Therefore, voltage regulation operations need to be performed on the distribution network side. At present, in addition to participating in regulating the capacity of the load side as a new energy source, distributed photovoltaics can also be used as a new voltage regulating entity to participate in the voltage regulation of the distribution network through its grid-connected converter.

为有效利用分布式光伏实现对配电网的调压从而解决配网电压越限的问题，目前的控制策略大多从直接控制角度出发，即配网运营商根据电网的运行状态与情况，直接对分布式光伏下达指令，使其参与调压过程。针对直接控制以调节配网电压的方法中和求解过程中的不足，亟需一种新的配电网电压调控策略。In order to effectively use distributed photovoltaics to regulate the voltage of the distribution network and solve the problem of distribution network voltage exceeding the limit, most of the current control strategies are based on direct control, that is, the distribution network operator directly controls the voltage according to the operating status and conditions of the power grid. Distributed photovoltaics issue instructions to participate in the voltage regulation process. In view of the shortcomings in the method of direct control to adjust the distribution network voltage and the solution process, a new distribution network voltage regulation strategy is urgently needed.

发明内容Contents of the invention

本发明的目的在于提供一种分布式光伏多主体集群电压调控方法及系统，以解决现有技术存在的问题，本发明建立综合考虑电压支撑度和光伏利用率的分布式光伏用户分级集群模型，将研究对象分解为上层配网运营商和下层分布式光伏用户，上下层通过深度强化学习理论实现分级集群调压控制，最后采用DQN、DDPG和萤火虫算法进行优化求解。可使光伏用户的用能策略动态跟踪网络拓扑的变化，并有助于提升求解速度、提高求解的收敛性。The purpose of the present invention is to provide a distributed photovoltaic multi-subject cluster voltage control method and system to solve the problems existing in the existing technology. The present invention establishes a distributed photovoltaic user hierarchical cluster model that comprehensively considers voltage support and photovoltaic utilization. The research object is decomposed into upper-layer distribution network operators and lower-layer distributed photovoltaic users. The upper and lower layers implement hierarchical cluster voltage regulation control through deep reinforcement learning theory. Finally, DQN, DDPG and firefly algorithms are used for optimization and solution. It can enable photovoltaic users' energy consumption strategies to dynamically track changes in network topology, and help improve the solution speed and convergence of the solution.

为达到上述目的，本发明采用如下技术方案：In order to achieve the above objects, the present invention adopts the following technical solutions:

分布式光伏多主体集群电压调控方法，包括以下步骤：The distributed photovoltaic multi-agent cluster voltage control method includes the following steps:

基于DSO和多主体光伏用户的系统框架，构建深度强化学习框架的多智能体协同优化结构；Based on the system framework of DSO and multi-agent photovoltaic users, a multi-agent collaborative optimization structure of the deep reinforcement learning framework is constructed;

基于多智能体协同优化结构，计算电压支撑度和光伏利用率；Based on the multi-agent collaborative optimization structure, the voltage support degree and photovoltaic utilization rate are calculated;

利用电压支撑度和光伏利用率两项指标综合表征光伏的调压与消纳性能，构建光伏分级模型；The two indicators of voltage support and photovoltaic utilization rate are used to comprehensively characterize the voltage regulation and absorption performance of photovoltaic, and build a photovoltaic classification model;

基于光伏分级模型，建立DSO与光伏用户之间的上下层协同优化模型；Based on the photovoltaic classification model, establish an upper-lower collaborative optimization model between DSO and photovoltaic users;

基于DSO与光伏用户之间的上下层协同优化模型，建立基于配网聚合商与分布式光伏用户协同的无功电压控制优化模型；Based on the upper-lower layer collaborative optimization model between DSO and photovoltaic users, a reactive power and voltage control optimization model based on the collaboration between distribution network aggregators and distributed photovoltaic users is established;

求解基于配网聚合商与分布式光伏用户协同的无功电压控制优化模型，得到上下层协同效益最大化的最优调压策略。The reactive power and voltage control optimization model based on the collaboration between distribution network aggregators and distributed photovoltaic users is solved to obtain the optimal voltage regulation strategy that maximizes the collaborative benefits of the upper and lower layers.

进一步地，所述电压支撑度计算公式如下：Further, the voltage support calculation formula is as follows:

其中，γ_i,t表示t时刻节点i的电压支撑度，Φ为电压越限的节点集合；n_ol为越限节点数，S_Q,i,j,t为t时刻节点间的无功-电压灵敏度系数，表征节点j变化单位量无功功率下节点i电压的变化量；Among them, γ _i,t represents the voltage support degree of node i at time t, Φ is the set of nodes with voltage exceeding the limit; n _ol is the number of nodes that exceed the limit, S _Q,i,j,t is the reactive power between nodes at time t - The voltage sensitivity coefficient represents the change in the voltage of node i when node j changes per unit amount of reactive power;

其中，ΔU_i,t、ΔQ_i,t分别为t时刻节点i的电压幅值和无功功率的变化量；Among them, ΔU _i,t and ΔQ _i,t are the voltage amplitude and reactive power change of node i at time t respectively;

所述光伏利用率的计算公式如下：The calculation formula of the photovoltaic utilization rate is as follows:

P_a(t)＝P_PV(t)-P_S.PV(t)P _a (t)=P _PV (t)-P _S.PV (t)

其中，R_PV为光伏利用率，P_PV(t)为t时刻光伏的理论发电量；P_a(t)为t时刻光伏实际被利用的电量，P_S.PV(t)为t时刻的弃光量。Among them, R _PV is the photovoltaic utilization rate, P _PV (t) is the theoretical power generation of photovoltaic at time t; P _a (t) is the actual photovoltaic power utilization at time t, and P _S.PV (t) is the waste energy at time t. amount of light.

进一步地，所述利用电压支撑度和光伏利用率两项指标综合表征光伏的调压与消纳性能，构建光伏分级模型，具体为：Further, the two indicators of voltage support and photovoltaic utilization are used to comprehensively characterize the voltage regulation and accommodation performance of photovoltaics, and a photovoltaic classification model is constructed, specifically as follows:

DSO侧通过将电压支撑度和光伏利用率作为二维参数，利用K-means聚类方法进行相似性聚类，将调压性能相似的光伏用户划分为同一级，表示如下：On the DSO side, voltage support and photovoltaic utilization are used as two-dimensional parameters and the K-means clustering method is used to perform similarity clustering, and photovoltaic users with similar voltage regulation performance are divided into the same level, which is expressed as follows:

其中，*表示由于电压支撑度与光伏利用率的数量级不同，先将两类参数归一化为0～1之间的数值；k_i为不同参数的权重大小，有k₁+k₂＝1，并在中存储分级结果；另外，用于衡量相似性指标d_ij,t为：Among them, * indicates that due to the different orders of magnitude between voltage support and photovoltaic utilization, the two types of parameters are first normalized to values between 0 and 1; k _i is the weight of different parameters, with k ₁ + k ₂ = 1 , and in The classification results are stored in; in addition, the similarity index d _ij,t used to measure is:

相似性指标越低，对应的分布式光伏用户越容易划分为同一级，且同级内的光伏电压支撑度、光伏利用率相近，DSO能够使同级内的光伏采用相同的调压补偿电价以合理分配用于调压的无功调节。The lower the similarity index, the easier it is for the corresponding distributed photovoltaic users to be divided into the same level, and the photovoltaic voltage support and photovoltaic utilization rate within the same level are similar. DSO can enable photovoltaics in the same level to use the same voltage regulation compensation price to ensure Properly allocate reactive power regulation for voltage regulation.

进一步地，所述DSO与光伏用户之间的上下层协同优化模型具体为：上层DSO主导配电网调压，作为提供补偿电价的一方；下层光伏用户以电价为激励参与调压的辅助服务市场；Further, the upper-lower collaborative optimization model between the DSO and photovoltaic users is specifically: the upper-layer DSO dominates the voltage regulation of the distribution network, as the party that provides compensation for electricity prices; the lower-layer photovoltaic users use electricity prices as incentives to participate in the auxiliary service market of voltage regulation. ;

DSO作为配电网运营商，在参与调压过程中要维持电压稳定，并保证光伏利用率尽可能高，即总调压成本尽可能低，表示如下：As a distribution network operator, DSO must maintain voltage stability during the voltage regulation process and ensure that the photovoltaic utilization rate is as high as possible, that is, the total voltage regulation cost is as low as possible, as follows:

其中，w_q,k,t为t时刻第k级中光伏用户的调压补偿电价；K为用户分级策略的总级数；m_k为第k级中光伏用户的个数；U_i,t为t时刻节点i的电压幅值；U_i,ref为节点i的电压幅值参考值；R_i,PV为节点i处接入分布式光伏的光伏利用率；α为维持电压稳定的成本系数，β为弃光成本系数；Among them, w _q,k,t is the voltage regulation compensation price of photovoltaic users in the k-th level at time t; K is the total number of levels of the user grading strategy; m _k is the number of photovoltaic users in the k-th level; U _i,t is the voltage amplitude of node i at time t; U _i,ref is the voltage amplitude reference value of node i; R _i,PV is the photovoltaic utilization rate of distributed photovoltaic connected to node i; α is the cost coefficient to maintain voltage stability , β is the light abandonment cost coefficient;

约束条件为：The constraints are:

P_i,t＝P_PV,i,t-P_load,i,t P _i,t =P _PV,i,t -P _load,i,t

U_min,t≤U_t≤U_max,t U _min,t ≤U _t ≤U _max,t

P_L,min,t≤P_L,t≤P_L,max,t P _L,min,t ≤P _L,t ≤P _L,max,t

其中，P_i,t和Q_i,t分别为t时刻节点i处注入有功、无功功率；θ_ij,t为t时刻节点i与j之间电相角差；G_ij和B_ij分别为支路ij的电导、电纳；j∈i表示所有与节点i相邻的节点；P_ij,t为t时刻从节点i到节点j之间流过的有功功率；P_load,i,t为t时刻节点i处的用户负荷；U_t为t时刻调压后的各节点电压幅值，U_max,t和U_min,t分别为t时刻电压允许的上、下限；P_L,t为t时刻网络支路潮流，P_L,max,t和P_L,min,t分别为t时刻支路潮流的上、下限。Among them, P _i,t and Q _i,t are the active and reactive power injected at node i at time t respectively; θ _ij,t is the electrical phase angle difference between nodes i and j at time t; G _ij and B _ij are respectively The conductance and susceptance of branch ij; j∈i represents all nodes adjacent to node i; P _ij,t is the active power flowing from node i to node j at time t; P _load,i,t is User load at node i at time t; U _t is the voltage amplitude of each node after voltage regulation at time t, U _max,t and U _min,t are respectively the upper and lower limits of the voltage allowed at time t; P _L,t is t The branch power flow of the network at time, P _L,max,t and P _L,min,t are the upper and lower limits of the branch power flow at time t respectively.

进一步地，所述约束条件还包括：Furthermore, the constraints also include:

w_q,min,i,t≤w_q,i,t≤w_q,max,i,t w _q,min,i,t ≤w _q,i,t ≤w _q,max,i,t

其中，w_q,max,i,t和w_q,min,i,t分别为t时刻调压补偿电价的上、下限；Q_max和Q_min分别为并网光伏逆变器的无功出力上下限，S_inv为并网光伏逆变器的额定功率，P_PV为此状态下的光伏有功出力；Q′_max和Q′_min分别为此状态下并网光伏逆变器的无功出力上下限，P′_PV为此状态下的光伏有功可削减范围，P_PVmax为此状态下光伏最大有功的出力值。Among them, w _q,max,i,t and w _q,min,i,t are the upper and lower limits of the voltage regulation compensation price at time t respectively; Q _max and Q _min are the reactive power output of the grid-connected photovoltaic inverter respectively. Lower limit, S _inv is the rated power of the grid-connected photovoltaic inverter, P _PV is the photovoltaic active output in this state; Q′ _max and Q′ _min are respectively the upper and lower limits of the reactive power output of the grid-connected photovoltaic inverter in this state. , P′ _PV is the range where the photovoltaic active power can be reduced in this state, and P _PVmax is the maximum active power output value of the photovoltaic in this state.

进一步地，所述基于配网聚合商与分布式光伏用户协同的无功电压控制优化模型，具体为：Further, the reactive power and voltage control optimization model based on the collaboration between distribution network aggregators and distributed photovoltaic users is specifically:

光伏用户收益主要考虑参与电压调节所得收益与调节过程中所需的成本，如下所示：The income of photovoltaic users mainly considers the income from participating in voltage regulation and the cost required in the adjustment process, as follows:

I_u,i,t＝I_u,gro,i,t-C_i,t I _u,i,t =I _u,gro,i,t -C _i,t

I_u,gro,i,t＝w_q,i,tΔQ_i,t I _u,gro,i,t =w _q,i,t ΔQ _i,t

其中，I_u,i,t为t时刻节点i处用户的调压总收益；I_u,gro,i,t为用户由于参与调压获得的补偿收益；C_i,t为调压过程中的所需成本；Among them, I _u,i,t is the total voltage regulation income of the user at node i at time t; I _u,gro,i,t is the compensation income obtained by the user due to participating in voltage regulation; C _i,t is the voltage regulation process required costs;

对于调压过程中的所需成本C_i,t而言，根据无功调节量的大小分为下述两类：The required cost C _i,t in the voltage regulation process is divided into the following two categories according to the size of the reactive power adjustment amount:

1)仅无功调节，即无功调节量不超过无功调节上限，光伏有功正常出力，则仅考虑换流器容量占用的成本：1) Only reactive power adjustment, that is, the reactive power adjustment amount does not exceed the upper limit of reactive power adjustment, and the photovoltaic active power output is normal, then only the cost of converter capacity occupation is considered:

其中，c_p为光伏的上网电价；ξ为光伏换流器无功调节所占用容量的利用率，占用相同容量下利用率越高，无功调节成本更高；Among them, c _p is the on-grid electricity price of photovoltaic; ξ is the utilization rate of the capacity occupied by the reactive power adjustment of the photovoltaic inverter. The higher the utilization rate under the same capacity, the higher the cost of reactive power adjustment;

2)有功无功联合调节，即无功调节量超过无功调节上限，则光伏有功具有削减以释放更多的无功容量的功能，此时成本还包括削减部分有功的售电收益：2) Joint adjustment of active and reactive power, that is, if the reactive power adjustment amount exceeds the upper limit of reactive power adjustment, the photovoltaic active power has the function of reducing to release more reactive power capacity. At this time, the cost also includes reducing part of the active power sales income:

对应的光伏有功削减量与无功吸收量受自身调节容量约束：The corresponding photovoltaic active power reduction and reactive power absorption are constrained by the self-regulation capacity:

0≤ΔP_i,t≤P_PV,i,t-P_load,i,t 0≤ΔP _i,t ≤P _PV,i,t -P _load,i,t

进一步地，所述求解基于配网聚合商与分布式光伏用户协同的无功电压控制优化模型，具体为：Further, the solution is based on a reactive power and voltage control optimization model in which distribution network aggregators and distributed photovoltaic users collaborate, specifically as follows:

采用DQN和DDPG深度强化学习算法实现对电压的控制，首先针对DQN，以电压波动量指标为奖励函数，以调压器件的具体动作为动作空间并得到新的状态空间存至数据值经验池D_DQN；此外DDPG算法同理，基于连续的动作空间考虑光伏有功无功的出力建立动作集，并以节点电压越限量和功率调节量为奖励函数，迭代得到新的状态空间并存储经验数据至经验池D_DDPG，此时外层一次循环得到DQN和DDPG的调压策略及结果，作为参数输入内层；内层以DSO和分布式光伏用户集群综合效益为目标函数，利用萤火虫算法对给定的调压策略进行调压补偿电价优化，最终得到上下层协同效益最大化的最优调压策略。DQN and DDPG deep reinforcement learning algorithms are used to realize voltage control. First, for DQN, the voltage fluctuation index is used as the reward function, the specific actions of the voltage regulating device are used as the action space, and a new state space is obtained and stored in the data value experience pool D _DQN ; In addition, the DDPG algorithm is the same. It establishes an action set based on the continuous action space and considers the active and reactive power of the photovoltaic. It uses the node voltage limit and the power adjustment amount as the reward function to iteratively obtain a new state space and store the empirical data to the experience Pool D _DDPG , at this time, the outer layer obtains the voltage regulation strategy and results of DQN and DDPG in one cycle, and inputs them into the inner layer as parameters; the inner layer uses the comprehensive benefits of DSO and distributed photovoltaic user clusters as the objective function, and uses the firefly algorithm to calculate the given The voltage regulation strategy is used to optimize the voltage regulation compensation electricity price, and finally the optimal voltage regulation strategy that maximizes the collaborative benefits of the upper and lower layers is obtained.

分布式光伏多主体集群电压调控系统，包括：Distributed photovoltaic multi-agent cluster voltage control system, including:

多智能体协同优化结构构建模块：基于DSO和多主体光伏用户的系统框架，构建深度强化学习框架的多智能体协同优化结构；Multi-agent collaborative optimization structure building module: Based on the system framework of DSO and multi-agent photovoltaic users, a multi-agent collaborative optimization structure of the deep reinforcement learning framework is constructed;

计算模块：基于多智能体协同优化结构，计算电压支撑度和光伏利用率；Calculation module: Based on the multi-agent collaborative optimization structure, calculate the voltage support and photovoltaic utilization rate;

光伏分级模型构建模块：利用电压支撑度和光伏利用率两项指标综合表征光伏的调压与消纳性能，构建光伏分级模型；Photovoltaic classification model building module: Use the two indicators of voltage support and photovoltaic utilization to comprehensively characterize the voltage regulation and accommodation performance of photovoltaics and build a photovoltaic classification model;

DSO与光伏用户之间的上下层协同优化模型建立模块：基于光伏分级模型，建立DSO与光伏用户之间的上下层协同优化模型；The upper-lower layer collaborative optimization model establishment module between DSO and photovoltaic users: Based on the photovoltaic classification model, establish an upper-lower layer collaborative optimization model between DSO and photovoltaic users;

一种计算机设备，包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现所述分布式光伏多主体集群电压调控方法的步骤。A computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the distributed photovoltaic multi-body cluster voltage is implemented. The steps of the control method.

一种计算机可读存储介质，所述计算机可读存储介质存储有计算机程序，其特征在于，所述计算机程序被处理器执行时实现所述分布式光伏多主体集群电压调控方法的步骤。A computer-readable storage medium stores a computer program, which is characterized in that when the computer program is executed by a processor, the steps of the distributed photovoltaic multi-subject cluster voltage control method are implemented.

与现有技术相比，本发明具有以下有益的技术效果：Compared with the existing technology, the present invention has the following beneficial technical effects:

本发明提供的方法定义了包含电压支撑度和光伏利用率两项指标的调压性能评估，考虑了无功调节不影响用户发电收益，且调节成本远小于有功削减造成的收益损失，为分布式光伏用户侧分级集群划分及其无功有功调节切换建模提供指标依据；对分布式光伏用户进行聚类分级，动态跟随网络拓扑，反应光伏有功与无功的耦合关系，简化配网运营商对从侧光伏用户的调压补偿划分；将深度强化学习的DQN和DDPG算法用于配网聚合商和分布式光伏用户的上下层协同电压调控策略算法上，有效提升求解的收敛性和求解速度；综合考虑调压补偿前提下的光伏用户分级的主从博弈框架，对配网侧有功无功耦合的调压目标提出了一种优化方案。The method provided by the present invention defines voltage regulation performance evaluation including two indicators: voltage support and photovoltaic utilization rate. It takes into account that reactive power regulation does not affect the user's power generation income, and the adjustment cost is far less than the income loss caused by active power reduction. It is a distributed The photovoltaic user-side hierarchical cluster division and reactive and active power regulation and switching modeling provide an indicator basis; the distributed photovoltaic users are clustered and classified, dynamically following the network topology, reflecting the coupling relationship between photovoltaic active power and reactive power, and simplifying the distribution network operator's From the voltage regulation compensation division of side photovoltaic users; the DQN and DDPG algorithms of deep reinforcement learning are used in the upper and lower layer collaborative voltage regulation strategy algorithm of distribution network aggregators and distributed photovoltaic users, effectively improving the convergence and solution speed of the solution; Taking into account the master-slave game framework of photovoltaic user classification under the premise of voltage regulation compensation, an optimization scheme is proposed for the voltage regulation target of active and reactive power coupling on the distribution network side.

附图说明Description of the drawings

说明书附图用来提供对本发明的进一步理解，构成本发明的一部分，本发明的示意性实施例及其说明用于解释本发明，并不构成对本发明的不当限定。The description and drawings are used to provide a further understanding of the present invention and constitute a part of the present invention. The illustrative embodiments of the present invention and their descriptions are used to explain the present invention and do not constitute an improper limitation of the present invention.

图1是本发明实施的一种用基于深度强化学习的上层聚合商与下层分布式光伏用户协同调压优化求解算法的流程示意图。Figure 1 is a schematic flow chart of a collaborative voltage regulation optimization solution algorithm implemented by the present invention based on deep reinforcement learning between upper-level aggregators and lower-level distributed photovoltaic users.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本发明方案，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分的实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都应当属于本发明保护的范围。In order to enable those skilled in the art to better understand the solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only These are some embodiments of the present invention, rather than all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts should fall within the scope of protection of the present invention.

需要说明的是，本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first", "second", etc. in the description and claims of the present invention and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments of the invention described herein are capable of being practiced in sequences other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions, e.g., a process, method, system, product, or apparatus that encompasses a series of steps or units and need not be limited to those explicitly listed. Those steps or elements may instead include other steps or elements not expressly listed or inherent to the process, method, product or apparatus.

实施例一Embodiment 1

本发明提出一种基于深度强化学习框架与算法的分布式光伏多主体集群无功电压调控方法，该方法具体步骤如下：The present invention proposes a distributed photovoltaic multi-subject cluster reactive power and voltage control method based on a deep reinforcement learning framework and algorithm. The specific steps of the method are as follows:

步骤1：基于分布式光伏调压的深度强化学习框架和光伏控制模式Step 1: Deep reinforcement learning framework and photovoltaic control mode based on distributed photovoltaic voltage regulation

为解决分布式光伏参与的配电网调压问题，本发明依托于DSO和多主体光伏用户的系统框架，其中DSO装设有能量管理系统(Energy Management System,EMS)，用于收集网架结构参数与电网运行数据，确定后续的潮流计算、光伏分级与调压补偿电价，DSO在调压过程中负责对光伏用户进行有序引导，保证配网整体安全稳定运行。而分布式光伏用户则采用“自发自用、余电上网”的运行模式，分布式光伏系统主要包含太阳能电池、负荷与用户能量管理系统(User Energy Management System,UEMS)，后者用于接收电网下发信息，并通过优化计算得到自身无功调节策略。In order to solve the voltage regulation problem of the distribution network involving distributed photovoltaics, the present invention relies on the system framework of DSO and multi-subject photovoltaic users, in which the DSO is equipped with an energy management system (Energy Management System, EMS) for collecting the grid structure. Parameters and grid operation data determine the subsequent power flow calculation, photovoltaic classification and voltage regulation compensation price. DSO is responsible for orderly guidance of photovoltaic users during the voltage regulation process to ensure the overall safe and stable operation of the distribution network. Distributed photovoltaic users adopt the operating mode of "self-use for self-use and grid-connected surplus power". The distributed photovoltaic system mainly includes solar cells, load and user energy management system (UEMS), which is used to receive energy from the power grid. Send information and obtain its own reactive power adjustment strategy through optimization calculation.

分布式光伏利用并网换流器的剩余容量进行无功吸收进而改变节点注入功率，当电压越限情况严重时，即配网无功调节容量无法满足调压需求时，分布式光伏可通过削减有功出力释放部分逆变器容量进行调压。目前的两种光伏调压方式分别为无功调节和有功无功联合调节。Distributed photovoltaic uses the remaining capacity of the grid-connected converter to absorb reactive power and then changes the node injection power. When the voltage exceeds the limit seriously, that is, when the reactive power adjustment capacity of the distribution network cannot meet the voltage regulation demand, distributed photovoltaic can reduce the The active output releases part of the inverter capacity for voltage regulation. The current two photovoltaic voltage regulation methods are reactive power regulation and active and reactive combined regulation.

在基于深度强化学习的聚合商、分布式光伏用户集群优化框架中，配网运营商和分布式光伏用户分别为各自的Agent，即构建了深度强化学习框架的多智能体协同优化结构。而整个跳崖需求的大环境为Environment。同时依据Q-learning强化学习的基本逻辑，配网聚合商和分布式光伏用户具体的调压动作为各自的动作策略(Action)，而分布式光伏用户根据DSO与其的调压补偿电价联系采取调整调压策略(State)反馈至配网运营商，最终优化得到满足各主体要求的目标函数及其优化结果(Reward)。In the optimization framework of aggregators and distributed photovoltaic user clusters based on deep reinforcement learning, distribution network operators and distributed photovoltaic users are their respective agents, that is, a multi-agent collaborative optimization structure of the deep reinforcement learning framework is constructed. The overall environment required for cliff jumping is Environment. At the same time, according to the basic logic of Q-learning reinforcement learning, the specific voltage regulation actions of the distribution network aggregator and distributed photovoltaic users are their respective action strategies (Action), and the distributed photovoltaic users make adjustments based on the relationship between the DSO and its voltage regulation compensation electricity price. The voltage regulation strategy (State) is fed back to the distribution network operator, and finally the objective function that meets the requirements of each subject is optimized and its optimization result (Reward) is obtained.

步骤2：电压支撑度和光伏利用率Step 2: Voltage support and photovoltaic utilization

1.电压支撑度1.Voltage support

电压支撑度表示的是分布式光伏无功功率的变化对配网电压的影响程度，采用灵敏度来表示节点电压幅值和节点功率注入之间的线性变化关系：Voltage support indicates the degree of impact of changes in distributed photovoltaic reactive power on the distribution network voltage. Sensitivity is used to represent the linear change relationship between node voltage amplitude and node power injection:

其中，式(1)为有功-电压灵敏度系数，式(2)为无功-电压灵敏度系数。即配电网中某一节点出力变化对节点电压的改善程度。式中，ΔU_i,t、ΔP_i,t、ΔQ_i,t分别为t时刻节点i的电压幅值、注入有功功率和无功功率的变化量；S_P,i,j,t和S_Q,i,j,t分别为t时刻节点间的有功-电压灵敏度和无功-电压灵敏度系数，分别表征节点j变化单位量有功、无功功率下节点i电压的变化量，可以由潮流计算的修正方程求得。Among them, formula (1) is the active power-voltage sensitivity coefficient, and formula (2) is the reactive power-voltage sensitivity coefficient. That is, the degree to which the output change of a certain node in the distribution network improves the node voltage. In the formula, ΔU _i,t , ΔP _i,t , ΔQ _i,t are the voltage amplitude, injected active power and reactive power changes of node i at time t respectively; S _P,i,j,t and S _{Q ,i,j,t} are respectively the active-voltage sensitivity and reactive-voltage sensitivity coefficients between nodes at time t, which respectively represent the change in the voltage of node i when node j changes unit amount of active and reactive power, which can be calculated by the power flow. Find the corrected equation.

可得出不同节点之间的功率变化对电压的影响程度不同，因此每个节点对电压的支撑度也不相同，参与电压调节的能力也不尽相同。t时刻节点i的电压支撑度γ_i,t可利用无功-电压灵敏度系数进行计算：It can be concluded that power changes between different nodes have different effects on voltage, so each node has different support for voltage and its ability to participate in voltage regulation is also different. The voltage support degree γ _{i,t of node i at time t} can be calculated using the reactive power-voltage sensitivity coefficient:

其中，Φ为电压越限的节点集合；n_ol为越限节点数。节点的电压支撑度指标越大，则该光伏节点在电网中改善电压问题的能力越强，即将全网电压控制到相同效果的情况下所需改变的无功量越小，调压效益更高。Among them, Φ is the set of nodes whose voltage exceeds the limit; _nol is the number of nodes that exceed the limit. The greater the node's voltage support index, the stronger the photovoltaic node's ability to improve voltage problems in the power grid. That is, the smaller the reactive power required to change the voltage of the entire grid to the same effect, and the higher the voltage regulation efficiency. .

2.光伏利用率2. Photovoltaic utilization rate

考虑到分布式光伏设立的根本目的是有效利用太阳能以支撑配网，因此考虑通过光伏利用率指标来反映新能源光伏的消纳情况，此处定义光伏利用率R_PV为实际利用的光伏发电量占理论发电量的比重：Considering that the fundamental purpose of setting up distributed photovoltaics is to effectively use solar energy to support the distribution network, it is considered to reflect the consumption of new energy photovoltaics through the photovoltaic utilization rate index. The photovoltaic utilization rate R _PV is defined here as the actual photovoltaic power generation. Proportion of theoretical power generation:

P_a(t)＝P_PV(t)-P_S.PV(t) (5)P _a (t)=P _PV (t)-P _S.PV (t) (5)

式中：P_PV(t)为t时刻光伏的理论发电量；P_a(t)为t时刻光伏实际被利用的电量，P_S.PV(t)为t时刻的弃光量。In the formula: P _PV (t) is the theoretical power generation of photovoltaic at time t; P _a (t) is the actual amount of photovoltaic power used at time t, and P _S.PV (t) is the amount of abandoned light at time t.

步骤3：构建光伏分级模型Step 3: Build a photovoltaic classification model

利用步骤2中定义的电压支撑度和光伏利用率两项指标来综合表征光伏的调压与消纳性能。DSO侧通过将该两项指标作为二维参数，利用K-means聚类方法进行相似性聚类，将调压性能相似的光伏用户划分为同一级：Use the two indicators of voltage support and photovoltaic utilization defined in step 2 to comprehensively characterize the voltage regulation and accommodation performance of photovoltaics. The DSO side uses these two indicators as two-dimensional parameters and uses the K-means clustering method to perform similarity clustering, and divides photovoltaic users with similar voltage regulation performance into the same level:

其中，“*”表示由于电压支撑度与光伏利用率的数量级不同，先将两类参数归一化为0～1之间的数值；k_i为不同参数的权重大小，有k₁+k₂＝1，并在中存储分级结果。另外，用于衡量相似性指标d_ij,t为：Among them, "*" means that because the voltage support degree and photovoltaic utilization rate are of different orders of magnitude, the two types of parameters are first normalized to values between 0 and 1; k _i is the weight of different parameters, with k ₁ + k ₂ =1, and in Store the grading results in . In addition, the similarity index d _ij,t used to measure is:

显然，相似性指标越低，对应的分布式光伏用户越容易划分为同一级，且同级内的光伏电压支撑度、光伏利用率相近，DSO可使同级内的光伏采用相同的调压补偿电价以合理分配用于调压的无功调节。Obviously, the lower the similarity index, the easier it is for the corresponding distributed photovoltaic users to be divided into the same level, and the photovoltaic voltage support and photovoltaic utilization rate within the same level are similar. DSO can enable photovoltaics in the same level to use the same voltage regulation compensation The electricity price is reasonably allocated for reactive power regulation of voltage regulation.

步骤4：DSO效益模型Step 4: DSO Benefit Model

在上述光伏分级基础上，为分析DSO与光伏用户的互动协调过程，建立了DSO与光伏用户之间的上下层协同优化策略：上层DSO主导配电网调压，作为提供补偿电价的一方；下层光伏用户以电价为激励参与调压的辅助服务市场。Based on the above photovoltaic classification, in order to analyze the interactive coordination process between DSO and photovoltaic users, an upper-lower layer collaborative optimization strategy between DSO and photovoltaic users was established: the upper-layer DSO dominates the voltage regulation of the distribution network and serves as the party that provides compensation for electricity prices; the lower-layer Photovoltaic users use electricity prices as incentives to participate in the ancillary service market of voltage regulation.

DSO作为配电网运营商，在参与调压过程中要维持电压稳定，并保证光伏利用率尽可能高，即总调压成本尽可能低：As a distribution network operator, DSO must maintain voltage stability during the voltage regulation process and ensure that the photovoltaic utilization rate is as high as possible, that is, the total voltage regulation cost is as low as possible:

式中，w_q,k,t为t时刻第k级中光伏用户的调压补偿电价；K为用户分级策略的总级数；m_k为第k级中光伏用户的个数；U_i,t为t时刻节点i的电压幅值；U_i,ref为节点i的电压幅值参考值；R_i,PV为节点i处接入分布式光伏的光伏利用率；α为维持电压稳定的成本系数，β为弃光成本系数。In the formula, w _q,k,t is the voltage regulation compensation price of photovoltaic users in the k-th level at time t; K is the total number of levels of the user grading strategy; m _k is the number of photovoltaic users in the k-th level; U _{i, t} is the voltage amplitude of node i at time t; U _i,ref is the voltage amplitude reference value of node i; R _i,PV is the photovoltaic utilization rate of distributed photovoltaic connected to node i; α is the cost of maintaining voltage stability Coefficient, β is the light abandonment cost coefficient.

约束条件：Restrictions:

P_i,t＝P_PV,i,t-P_load,i,t (11)P _i,t =P _PV,i,t -P _load,i,t (11)

U_min,t≤U_t≤U_max,t (12)U _min,t ≤U _t ≤U _max,t (12)

P_L,min,t≤P_L,t≤P_L,max,t (13)P _L,min,t ≤P _L,t ≤P _L,max,t (13)

同时对调压补偿在市场规定下进行约束：At the same time, voltage regulation compensation is subject to market regulations:

w_q,min,i,t≤w_q,i,t≤w_q,max,i,t (14)w _q,min,i,t ≤w _q,i,t ≤w _q,max,i,t (14)

式中，w_q,max,i,t和w_q,min,i,t分别为t时刻调压补偿电价的上、下限。In the formula, w _q,max,i,t and w _q,min,i,t are the upper and lower limits of the voltage regulation compensation price at time t respectively.

此外，考虑分布式光伏出力的约束，由于光伏有功出力由光照条件决定，因此光伏的有功功率只能削减而不能增加，当配网发生欠电压现象时，机组只有无功参与调压过程，此时可参与电压调节的光伏无功功率最大值In addition, considering the constraints of distributed photovoltaic output, since the active output of photovoltaic is determined by the lighting conditions, the active power of photovoltaic can only be reduced but not increased. When undervoltage occurs in the distribution network, only the reactive power of the unit participates in the voltage regulation process. The maximum value of photovoltaic reactive power that can participate in voltage regulation when

式中，Q_max和Q_min分别为此状态下并网光伏逆变器的无功出力上下限，S_inv为并网光伏逆变器的额定功率，P_PV为此状态下的光伏有功出力。而若配网发生过电压现象时，考虑无功和有功同时调压控制，此时可参与电压调节的光伏无功功率最大为In the formula, Q _max and Q _min are respectively the upper and lower limits of the reactive power output of the grid-connected photovoltaic inverter in this state, S _inv is the rated power of the grid-connected photovoltaic inverter, and P _PV is the photovoltaic active output in this state. If overvoltage occurs in the distribution network, consider simultaneous voltage regulation control of reactive power and active power. At this time, the maximum photovoltaic reactive power that can participate in voltage regulation is

式中，Q′_max和Q′_min分别为此状态下并网光伏逆变器的无功出力上下限，P′_PV为此状态下的光伏有功可削减范围，P_PVmax为此状态下光伏最大有功的出力值。In the formula, Q′ _max and Q′ _min are respectively the upper and lower limits of the reactive power output of the grid-connected photovoltaic inverter in this state, P′ _PV is the photovoltaic active power reduction range in this state, and P _PVmax is the maximum photovoltaic power output in this state. Active output value.

步骤5：光伏用户多主体效益模型Step 5: Photovoltaic user multi-agent benefit model

光伏用户收益主要考虑参与电压调节所得收益与调节过程中所需的成本：The income of photovoltaic users mainly considers the income from participating in voltage regulation and the costs required in the adjustment process:

I_u,i,t＝I_u,gro,i,t-C_i,t (17)I _u,i,t =I _u,gro,i,t -C _i,t (17)

I_u,gro,i,t＝w_q,i,tΔQ_i,t (18)I _u,gro,i,t ＝w _q,i,t ΔQ _i,t (18)

其中，I_u,i,t为t时刻节点i处用户的调压总收益；I_u,gro,i,t为用户由于参与调压获得的补偿收益；C_i,t为调压过程中的所需成本。Among them, I _u,i,t is the total voltage regulation income of the user at node i at time t; I _u,gro,i,t is the compensation income obtained by the user due to participating in voltage regulation; C _i,t is the voltage regulation process Required costs.

对于C_i,t成本而言，根据无功调节量的大小分为下述两类：For C _i,t cost, it is divided into the following two categories according to the size of the reactive power adjustment amount:

其中，c_p为光伏的上网电价；ξ为光伏换流器无功调节所占用容量的利用率，占用相同容量下利用率越高，无功调节成本更高。Among them, c _p is the on-grid electricity price of photovoltaic; ξ is the utilization rate of the capacity occupied by the reactive power regulation of the photovoltaic converter. The higher the utilization rate under the same capacity, the higher the cost of reactive power regulation.

2)有功无功联合调节，即无功调节量超过无功调节上限，则光伏有功具有一定程度削减以释放更多的无功容量的功能，此时成本还需包括削减部分有功的售电收益：2) Joint adjustment of active and reactive power, that is, when the reactive power adjustment amount exceeds the upper limit of reactive power adjustment, the photovoltaic active power has the function of being reduced to a certain extent to release more reactive power capacity. At this time, the cost also needs to include the reduction of part of the active power sales income. :

0≤ΔP_i,t≤P_PV,i,t-P_load,i,t (21)0≤ΔP _i,t ≤P _PV,i,t -P _load,i,t (21)

步骤6：基于深度强化学习的聚合商与分布式光伏协同电压控制优化算法Step 6: Collaborative voltage control optimization algorithm between aggregator and distributed photovoltaic based on deep reinforcement learning

在Q-learning基础上引入了深度强化学习算法，并融合萤火虫算法用于求解基于配网聚合商与分布式光伏用户协同的无功电压控制优化模型。Based on Q-learning, a deep reinforcement learning algorithm is introduced, and the firefly algorithm is integrated to solve the reactive power and voltage control optimization model based on the collaboration between distribution network aggregators and distributed photovoltaic users.

采用DQN和DDPG算法联合对电压控制目标进行优化，其中深度Q网络(Deep QNetwork,DQN)在Q-learning基础上结合深度神经网络的思想用于估计连续Q函数，通过经验回放机制，将智能体与环境交互过程中的每个时间点得到的经验数据存储至经验池(Experience Buffer)，而后从经验池中随机采样从而降低训练数据的关联性，提升求解速度和收敛能力。The DQN and DDPG algorithms are used to jointly optimize the voltage control target. The Deep QNetwork (DQN) combines the idea of deep neural networks based on Q-learning to estimate the continuous Q function. Through the experience playback mechanism, the agent The experience data obtained at each time point during the interaction with the environment is stored in the experience pool (Experience Buffer), and then randomly sampled from the experience pool to reduce the correlation of training data and improve the solution speed and convergence ability.

深度确定性策略梯度算法(Deep Deterministic Policy Gradient,DDPG)同理，能够有效基于下层分布式光伏用户的效益目标函数与约束条件求解获得最优响应的问题。DDPG可在连续的动作域上探索最优动作，进而获得全局最优解，采用Actor-Critic架构，前者以状态为输入，动作为输出，后者以状态和动作二维数据为输入，估计的Q’值为输出，通过最小化Q’的偏差值为目标实现优化。In the same way, the Deep Deterministic Policy Gradient (DDPG) algorithm can effectively solve the problem of obtaining the optimal response based on the benefit objective function and constraints of the underlying distributed photovoltaic users. DDPG can explore the optimal action in the continuous action domain and obtain the global optimal solution. It adopts the Actor-Critic architecture. The former takes the state as input and the action as the output. The latter takes the two-dimensional data of state and action as input, and the estimated The Q' value is the output, and optimization is achieved by minimizing the deviation value of Q' as the goal.

本发明采用DQN和DDPG深度强化学习算法实现对电压的控制，首先针对DQN，以电压波动量指标为奖励函数，以调压器件的具体动作为动作空间并得到新的状态空间存至数据值经验池D_DQN；此外DDPG算法同理，基于连续的动作空间考虑光伏有功无功的出力建立动作集，并以节点电压越限量和功率调节量为奖励函数，迭代得到新的状态空间并存储经验数据至经验池D_DDPG，此时外层一次循环得到DQN和DDPG的调压策略及结果，作为参数输入内层；内层以DSO和分布式光伏用户集群综合效益为目标函数，利用萤火虫算法对给定的调压策略进行调压补偿电价优化，最终得到上下层协同效益最大化的最优调压策略。The present invention uses DQN and DDPG deep reinforcement learning algorithms to realize voltage control. First, for DQN, the voltage fluctuation index is used as the reward function, and the specific actions of the voltage regulating device are used as the action space and a new state space is saved to the data value experience. Pool D _DQN ; In addition, the DDPG algorithm is the same. It establishes an action set based on the continuous action space and considers the active and reactive power of the photovoltaic. It uses the node voltage limit and the power adjustment amount as the reward function to iteratively obtain a new state space and store the empirical data. to the experience pool D _DDPG . At this time, the outer layer obtains the voltage regulation strategies and results of DQN and DDPG in one cycle, and inputs them into the inner layer as parameters; the inner layer uses the comprehensive benefits of DSO and distributed photovoltaic user clusters as the objective function, and uses the firefly algorithm to calculate the given A certain voltage regulation strategy is used to optimize the voltage regulation compensation electricity price, and finally the optimal voltage regulation strategy that maximizes the collaborative benefits of the upper and lower layers is obtained.

以下将结合图1对具体的求解步骤进行具体说明：The specific solution steps will be explained in detail below in conjunction with Figure 1:

Step 1：DQN算法设定与初始化Step 1: DQN algorithm setting and initialization

将各节点的电压、有功功率和无功功率作为被控对象，设定状态空间为所有PQ节点上述参数的集合，即：Taking the voltage, active power and reactive power of each node as controlled objects, set the state space to be the set of the above parameters of all PQ nodes, that is:

S_DSO,i＝{v₁,…,v_k,…,v_n；p₁,…,p_k,…,p_n；q₁,…,q_k,…,q_n} (24)S _DSO,i ={v ₁ ,…,v _k ,…,v _n ;p ₁ ,…,p _k ,…,p _n ;q ₁ ,…,q _k ,…,q _n } (24)

式中，v_k、p_k和q_k分别为第k个节点量测的电压、有功功率和无功功率值，且1≤k≤n为PQ节点总数。In the formula, v _k , p _k and q _k are the voltage, active power and reactive power values measured at the k-th node respectively, and 1≤k≤n is the total number of PQ nodes.

而DQN算法的动作空间设定为节点k用于补偿电压的无功补偿量，并通过对应的调压补偿电价形式传递至下层分布式光伏用户及其集群。同时将最小化节点电压越限量作为控制目标，即设定奖励函数为各节点电压越限量二次型与电容无功补偿之和，即：The action space of the DQN algorithm is set to the reactive power compensation amount used by node k to compensate the voltage, and is transmitted to the underlying distributed photovoltaic users and their clusters through the corresponding voltage regulation compensation price. At the same time, minimizing the node voltage overrun limit is taken as the control goal, that is, the reward function is set to be the sum of the quadratic form of each node voltage overrun limit and capacitor reactive power compensation, that is:

r_DSO,i＝-[Δv₁,…,Δv_i,…,Δv_n]Q[Δv₁,…,Δv_i,…,Δv_n]^T-RC_q,k (25)r _DSO,i =-[Δv ₁ ,…,Δv _i ,…,Δv _n ]Q[Δv ₁ ,…,Δv _i ,…,Δv _n ] ^T -RC _q,k (25)

式中，Δv_i为节点i电压的越限量；C_q,k为节点k的无功调压补偿电价；Q和R为权重矩阵和权重系数。In the formula, Δv _i is the overshoot limit of the voltage of node i; C _q,k is the reactive voltage regulation compensation price of node k; Q and R are the weight matrix and weight coefficient.

完成上述动作空间，奖励函数和状态空间的设置，开始迭代。Complete the settings of the above action space, reward function and state space, and start iteration.

Step 2：DDPG算法设定与初始化Step 2: DDPG algorithm setting and initialization

设定各节点当前电压、有功和无功功率为状态空间：Set the current voltage, active and reactive power of each node as the state space:

S_PV,i＝{v₁,…,v_k,…,v_n；p₁,…,p_k,…,p_n；q₁,…,q_k,…,q_n} (26)S _PV,i ={v ₁ ,…,v _k ,…,v _n ;p ₁ ,…,p _k ,…,p _n ;q ₁ ,…,q _k ,…,q _n } (26)

且DDPG算法用于连续的动作空间，将下层响应上层调压补偿电价的有功无功出力调整设定为动作集A_PV,i＝{A_P,A_Q}＝{ΔP_i,ΔQ_i}，同时将奖励函数设定为各节点的电压越限量二次型和功率调节量之和，即：And the DDPG algorithm is used in the continuous action space, and the active and reactive output adjustment of the lower layer in response to the upper layer voltage regulation compensation price is set as the action set A _PV,i = {A _P ,A _Q }={ΔP _i ,ΔQ _i }, At the same time, the reward function is set as the sum of the quadratic form of the voltage crossing limit of each node and the power adjustment amount, that is:

r_PV,i＝-[Δv₁,…,Δv_i,…,Δv_n]Q[Δv₁,…,Δv_i,…,Δv_n]^T r _PV,i =-[Δv ₁ ,…,Δv _i ,…,Δv _n ]Q[Δv ₁ ,…,Δv _i ,…,Δv _n ] ^T

-[p_PV,1,…,p_PV,i,…,p_PV,n]R[p_PV,1,…,p_PV,i,…,p_PV,n]^T -[p _PV,1 ,…,p _PV,i ,…,p _PV,n ]R[p _PV,1 ,…,p _PV,i ,…,p _PV,n ] ^T

-[q_PV,1,…,q_PV,i,…,q_PV,n]J[q_PV,1,…,q_PV,i,…,q_PV,n]^T (27)-[q _PV,1 ,…,q _PV,i ,…,q _PV,n ]J[q _PV,1 ,…,q _{PV,i ,} …,q _PV,n ] ^T (27)

式中，Δv_i为节点i处的电压越限量；p_PV,i为节点i的分布式光伏有功出力；q_PV,i为节点i的分布式光伏无功出力，Q、R和J为对应的权重矩阵和权重系数。In the formula, Δv _i is the voltage overshoot limit at node i; p _PV,i is the distributed photovoltaic active output of node i; q _PV,i is the distributed photovoltaic reactive power output of node i, Q, R and J are the corresponding weight matrix and weight coefficient.

由给定的动作空间执行分布式光伏用户的调压动作，并根据式(27)得到即时奖励r_PV，同时更新状态空间，存储上述经验数据(S_PV,a_PV,r_PV,S'_PV)至经验池D_DDPG中。随后从上述经验池D_DDPG中随机采样mini-batch组经验数据，同时根据策略梯度更新Actor和Critic网络；并依据soft update更新Actor和Critic目标网络。得到满足调压需求的调压策略后输入到内嵌的萤火虫算法，利用其寻优收敛迅速的特点，对基于DSO和分布式光伏用户集群的经济效益的调压策略进一步优化。The voltage regulation action of distributed photovoltaic users is executed from the given action space, and the instant reward r _PV is obtained according to Equation (27). At the same time, the state space is updated and the above experience data (S _PV ,a _PV ,r _PV ,S' _PV ) to the experience pool D _DDPG . Then the mini-batch group experience data is randomly sampled from the above-mentioned experience pool D _DDPG , and the Actor and Critic networks are updated according to the policy gradient; and the Actor and Critic target networks are updated according to soft update. After obtaining the voltage regulation strategy that meets the voltage regulation requirements, it is input into the embedded firefly algorithm, and its rapid optimization and convergence characteristics are used to further optimize the voltage regulation strategy based on the economic benefits of DSO and distributed photovoltaic user clusters.

Step 3：萤火虫算法优化求解Step 3: Firefly algorithm optimization solution

采用萤火虫优化算法进行基于调压控制的协同效益优化策略搜索以配网聚合商和分布式光伏用户集群的综合调压效益最高为目标函数，同时以各自的硬件约束为约束条件：The firefly optimization algorithm is used to search for the collaborative benefit optimization strategy based on voltage regulation control, with the highest comprehensive voltage regulation benefit of the distribution network aggregator and distributed photovoltaic user cluster as the objective function, and at the same time using their respective hardware constraints as constraints:

E＝μE_D,t+(1-μ)I_u,i,t (28)E＝μE _D,t +(1-μ)I _u,i,t (28)

式中，μ为DSO和分布式光伏用户的综合效益权重，一般取μ＝0.5，即两个主体的效益具有同等地位。In the formula, μ is the comprehensive benefit weight of DSO and distributed photovoltaic users. Generally, μ = 0.5, that is, the benefits of the two entities have the same status.

约束条件：式(9)—(16)；式(18)—(23)。Constraints: Formula (9)-(16); Formula (18)-(23).

1)初始化萤火虫种群数量、问题维度和迭代次数，并随机生成若干萤火虫个体位置，定义“亮度值”的适应度函数；1) Initialize the firefly population number, problem dimension and number of iterations, randomly generate several individual firefly positions, and define the fitness function of "brightness value";

2)萤火虫根据各自的吸引程度改变自身位置，而吸引程度与个体亮度和距离均相关。其中在萤火虫更新位置过程中进行随机扰动，从而提升该算法跳出局部最优解的能力；2) Fireflies change their positions according to their respective degrees of attraction, which are related to both individual brightness and distance. Among them, random perturbation is performed during the process of updating the position of the fireflies, thereby improving the algorithm's ability to jump out of the local optimal solution;

3)计算亮度，在萤火虫靠近的过程中根据所有萤火虫的新位置重新计算其适应度即亮度；3) Calculate the brightness, and recalculate the fitness or brightness of all fireflies based on their new positions as the fireflies approach;

4)判断结束条件，即当迭代次数到达上限或者适应度偏差在可接受阈值之内，推出迭代并输出最优解，此最优解即为满足调压需求的上下层经济效益最大化结果。若未满足则继续迭代；4) Determine the end condition, that is, when the number of iterations reaches the upper limit or the fitness deviation is within the acceptable threshold, launch the iteration and output the optimal solution. This optimal solution is the result of maximizing the economic benefits of the upper and lower layers that meets the pressure regulation requirements. If not satisfied, continue iteration;

Step 4：最优Q值迭代与求解Step 4: Optimal Q value iteration and solution

由Step 1确定的DQN智能体初始状态是s_DSO，根据下层反馈的满足调压需求的经济效益最大化策略继续迭代最大Q值；The initial state of the DQN agent determined by Step 1 is s _DSO , and it continues to iterate the maximum Q value based on the economic benefit maximization strategy fed back by the lower layer to meet the voltage regulation needs;

依据ε贪婪算法并从DQN的动作空间中选择具体的调压动作，根据式(25)得到即时奖励r_DSO，同时更新DQN状态空间。存储经验数据(S_DSO,a_DSO,r_DSO,S'_DSO)至经验池D_DQN，并从经验池中随机采样经验数据并更新Q网络和目标Q网络。通过迭代最终得到满足上下层经济效益最大化的最优调压策略。Based on the ε greedy algorithm and selecting specific voltage regulation actions from the action space of DQN, the instant reward r _DSO is obtained according to Equation (25), and the DQN state space is updated at the same time. Store experience data (S _DSO , a _DSO , r _DSO , S' _DSO ) to the experience pool D _DQN , and randomly sample experience data from the experience pool and update the Q network and target Q network. Through iteration, the optimal pressure regulation strategy that maximizes the economic benefits of the upper and lower layers is finally obtained.

实施例二Embodiment 2

本发明提供一种分布式光伏多主体集群电压调控系统，包括：The invention provides a distributed photovoltaic multi-subject cluster voltage control system, including:

实施例三Embodiment 3

本发明提供一种计算机设备，包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现所述分布式光伏多主体集群电压调控方法的步骤。The present invention provides a computer device, including a memory, a processor and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the distributed photovoltaic multi-purpose system is implemented. Steps of the subject cluster voltage regulation method.

实施例四Embodiment 4

本领域内的技术人员应明白，本发明的实施例可提供为方法、系统、或计算机程序产品。因此，本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present invention may be provided as methods, systems, or computer program products. Thus, the invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing device produce a use A device for realizing the functions specified in one process or multiple processes of the flowchart and/or one block or multiple blocks of the block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions The device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device. Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.

最后应当说明的是：以上实施例仅用于说明本发明的技术方案而非对其保护范围的限制，尽管参照上述实施例对本发明进行了详细的说明，所属领域的普通技术人员应当理解:本领域技术人员阅读本发明后依然可对发明的具体实施方式进行种种变更、修改或者等同替换，但这些变更、修改或者等同替换，均在发明待批的权利要求保护范围之内。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and do not limit the scope of protection. Although the present invention has been described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: Those skilled in the art can still make various changes, modifications or equivalent substitutions to the specific implementation modes of the invention after reading the present invention, but these changes, modifications or equivalent substitutions are within the protection scope of the pending claims of the invention.

Claims

1. The voltage regulation and control method for the distributed photovoltaic multi-main-body cluster is characterized by comprising the following steps of:

based on a DSO and a system frame of a multi-body photovoltaic user, constructing a multi-agent collaborative optimization structure of a deep reinforcement learning frame;

calculating the voltage support and the photovoltaic utilization rate based on the multi-agent cooperative optimization structure;

comprehensively representing the voltage regulation and digestion performances of the photovoltaic by using two indexes of voltage support and photovoltaic utilization rate, and constructing a photovoltaic grading model;

based on the photovoltaic grading model, establishing an upper-layer and lower-layer collaborative optimization model between DSO and a photovoltaic user;

based on an upper layer and lower layer collaborative optimization model between DSO and photovoltaic users, establishing a reactive voltage control optimization model based on cooperation of a distribution network aggregator and distributed photovoltaic users;

and solving a reactive voltage control optimization model based on cooperation of a distribution network aggregator and a distributed photovoltaic user to obtain an optimal voltage regulation strategy with maximized upper and lower layer cooperation benefits.

2. The distributed photovoltaic multi-body cluster voltage regulation method of claim 1, wherein the voltage support calculation formula is as follows:

wherein, gamma _i,t The voltage support degree of a node i at the moment t is represented, and phi is a node set with voltage out-of-limit; n is n _ol S is the number of out-of-limit nodes _Q,i,j,t Representing the variation of the voltage of the node i under the variation unit quantity reactive power of the node j for the reactive power-voltage sensitivity coefficient among the nodes at the moment t;

wherein DeltaU _i,t 、ΔQ _i,t The voltage amplitude and the reactive power variation of the node i at the moment t are respectively;

the calculation formula of the photovoltaic utilization rate is as follows:

P _a (t)＝P _PV (t)-P _S.PV (t)

wherein R is _PV For photovoltaic utilization, P _PV (t) photovoltaic at time tIs a theoretical power generation amount; p (P) _a (t) is the actual electricity quantity utilized by the photovoltaic at the moment t, P _S.PV And (t) is the amount of waste at time t.

3. The method for regulating and controlling the voltage of the distributed photovoltaic multi-main-body cluster according to claim 2, wherein the voltage regulation and the absorption performance of the photovoltaic are comprehensively characterized by using two indexes of voltage support degree and photovoltaic utilization rate, and a photovoltaic grading model is constructed, specifically:

the DSO side performs similarity clustering by using a K-means clustering method by taking the voltage support degree and the photovoltaic utilization rate as two-dimensional parameters, and divides photovoltaic users with similar voltage regulation performance into the same level, wherein the steps are expressed as follows:

wherein, the two types of parameters are normalized to be a numerical value between 0 and 1 because the magnitude order of the voltage support degree and the photovoltaic utilization rate is different; k (k) _i For the weight of different parameters, there is k ₁ +k ₂ =1, and inStoring the grading result; in addition, for measuring similarity index d _ij,t The method comprises the following steps:

the lower the similarity index is, the easier the corresponding distributed photovoltaic users are divided into the same level, and the photovoltaic voltage support degree and the photovoltaic utilization rate in the same level are similar, and the DSO can enable the photovoltaic in the same level to adopt the same voltage regulation compensation electricity price so as to reasonably distribute reactive power regulation for voltage regulation.

4. The distributed photovoltaic multi-body cluster voltage regulation method according to claim 1, wherein the upper and lower layer collaborative optimization model between DSO and photovoltaic users is specifically: the upper DSO controls the voltage regulation of the power distribution network as a party for providing the compensation electricity price; the lower layer photovoltaic users take electricity price as an incentive to participate in the auxiliary service market of voltage regulation;

the DSO is used as a power distribution network operator, and is used for maintaining voltage stability in the participation of voltage regulation, ensuring that the photovoltaic utilization rate is as high as possible, namely the total voltage regulation cost is as low as possible, and the method is expressed as follows:

wherein w is _q,k,t The voltage regulation compensation electricity price of the photovoltaic user in the kth stage at the moment t is obtained; k is the total number of stages of the user grading strategy; m is m _k The number of photovoltaic users in the kth stage; u (U) _i,t The voltage amplitude of the node i at the moment t; u (U) _i,ref A voltage amplitude reference value of the node i; r is R _i,PV The photovoltaic utilization rate of the distributed photovoltaic is accessed to the node i; alpha is a cost coefficient for maintaining voltage stability, and beta is a light rejection cost coefficient;

the constraint conditions are as follows:

P _i,t ＝P _PV,i,t -P _load,i,t

U _min,t ≤U _t ≤U _max,t

P _L,min,t ≤P _L,t ≤P _L,max,t

wherein P is _i,t And Q _i,t Active power and reactive power are respectively injected into the node i at the moment t; θ _ij,t The electric phase angle difference between the nodes i and j at the moment t; g _ij And B _ij Conductance and susceptance of branches ij respectivelyThe method comprises the steps of carrying out a first treatment on the surface of the j e i represents all nodes adjacent to node i; p (P) _ij,t Active power flowing from node i to node j at time t; p (P) _load,i,t The user load at the node i at the moment t; u (U) _t The voltage amplitude of each node after voltage regulation at the moment t is U _max,t And U _min,t The upper limit and the lower limit of the voltage permission at the moment t are respectively set; p (P) _L,t For the network branch tide at the moment t, P _L,max,t And P _L,min,t The upper limit and the lower limit of the branch power flow at the moment t are respectively set.

5. The method of claim 4, wherein the constraint further comprises:

w _q,min,i,t ≤w _q,i,t ≤w _q,max,i,t

wherein w is _q,max,i,t And w _q,min,i,t The upper limit and the lower limit of the voltage regulating compensation electricity price at the moment t are respectively set; q (Q) _max And Q _min The reactive output upper and lower limits of the grid-connected photovoltaic inverter are respectively S _inv For rated power of grid-connected photovoltaic inverter, P _PV Photovoltaic active power in this state; q'. _max And Q' _min Reactive power upper and lower limits, P 'of grid-connected photovoltaic inverter under the state respectively' _PV The photovoltaic active power in this state can be reduced in range, P _{PV max} The maximum active output value of the photovoltaic in this state.

6. The method for regulating and controlling the voltage of the distributed photovoltaic multi-main-body cluster according to claim 4, wherein the reactive voltage control optimization model based on cooperation of a distribution network aggregator and a distributed photovoltaic user is specifically as follows:

photovoltaic user benefits mainly consider the benefits obtained by participating in voltage regulation and the cost required in the regulation process, as follows:

I _u,i,t ＝I _u,gro,i,t -C _i,t

I _u,gro,i,t ＝w _q,i,t ΔQ _i,t

wherein I is _u,i,t The total voltage regulation income of the user at the node i at the moment t is obtained; i _u,gro,i,t Compensation income obtained for the user due to participation in voltage regulation; c (C) _i,t Is the required cost in the pressure regulating process;

for the required cost C in the pressure regulating process _i,t The reactive power adjustment amount is classified into the following two types according to the magnitude of the reactive power adjustment amount:

1) Only reactive power regulation, namely reactive power regulation quantity does not exceed the reactive power regulation upper limit, and photovoltaic active power normally outputs power, only the cost occupied by the capacity of the converter is considered:

wherein c _p The photovoltaic online electricity price is that of photovoltaic; xi is the utilization rate of the capacity occupied by reactive power regulation of the photovoltaic converter, and the higher the utilization rate is under the condition of occupying the same capacity, the higher the reactive power regulation cost is;

2) Active and reactive combined regulation, i.e. the reactive regulation quantity exceeds the reactive regulation upper limit, the photovoltaic active has the function of reducing to release more reactive capacity, and the cost also comprises the electricity selling benefits of reducing part of the active power:

the corresponding photovoltaic active reduction and reactive absorption are constrained by self-regulating capacity:

0≤ΔP _i,t ≤P _PV,i,t -P _load,i,t

7. the method for regulating and controlling the voltage of the distributed photovoltaic multi-main-body cluster according to claim 4, wherein the solving is based on a reactive voltage control optimization model which is cooperated by a distribution network aggregator and a distributed photovoltaic user, and specifically comprises the following steps:

the DQN and DDPG deep reinforcement learning algorithm is adopted to realize the control of voltage, firstly, aiming at the DQN, the voltage fluctuation quantity index is used as a rewarding function, the specific action of the voltage regulator is used as an action space, and a new state space is obtained and stored in a data value experience pool D _DQN The method comprises the steps of carrying out a first treatment on the surface of the In addition, the DDPG algorithm is similar, an action set is established based on continuous action space considering the output of photovoltaic active and reactive power, the node voltage threshold and the power adjustment are taken as rewarding functions, a new state space is obtained through iteration, and experience data are stored in an experience pool D _DDPG At this time, the outer layer circularly obtains the pressure regulating strategy and result of DQN and DDPG as parameters to be input into the inner layer; and the inner layer takes the comprehensive benefits of the DSO and the distributed photovoltaic user clusters as an objective function, and utilizes a firefly algorithm to carry out voltage regulation compensation electricity price optimization on a given voltage regulation strategy, so that the optimal voltage regulation strategy with the maximum upper and lower layer cooperative benefits is finally obtained.

8. Distributed photovoltaic multi-body cluster voltage regulation and control system, characterized by comprising:

the multi-agent collaborative optimization structure construction module comprises: based on a DSO and a system frame of a multi-body photovoltaic user, constructing a multi-agent collaborative optimization structure of a deep reinforcement learning frame;

the calculation module: calculating the voltage support and the photovoltaic utilization rate based on the multi-agent cooperative optimization structure;

and the photovoltaic hierarchical model building module: comprehensively representing the voltage regulation and digestion performances of the photovoltaic by using two indexes of voltage support and photovoltaic utilization rate, and constructing a photovoltaic grading model;

the upper and lower layer collaborative optimization model establishment module between the DSO and the photovoltaic user: based on the photovoltaic grading model, establishing an upper-layer and lower-layer collaborative optimization model between DSO and a photovoltaic user;

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the distributed photovoltaic multi-body cluster voltage regulation method according to any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the distributed photovoltaic multi-body cluster voltage regulation method according to any one of claims 1 to 7.