Detailed Description
As shown in fig. 1, a method for adding control branches under multiple scenes in the execution process of demand response is provided, which comprises the following steps:
s1, acquiring multi-mode data, and processing the multi-mode data by using a tensor decomposition method and a support vector machine to obtain a scene prediction sequence and a scene type identifier, wherein the multi-mode data comprises real-time load data, historical load data, environment parameter data and user behavior data;
s2, reading a scene prediction sequence and a scene type identifier, and processing by adopting a fuzzy comprehensive evaluation method and a graph neural network by combining real-time load data, historical load data and prestored system constraint data to obtain a shunt response capacity matrix and a control constraint matrix;
S3, processing by adopting a multi-objective optimization method based on scene type identification, scene prediction sequences, shunt response capacity matrixes, control constraint matrixes and user behavior data to obtain an optimization objective function set, a weight coefficient matrix and a constraint condition set;
and S4, processing by adopting a genetic algorithm and a dynamic programming method based on the optimized objective function set, the weight coefficient matrix and the constraint condition set to obtain a control strategy matrix and an execution time sequence vector.
As shown in fig. 2, according to an aspect of the present application, step S1 is further:
S11, acquiring multi-modal data from a data acquisition system, wherein the multi-modal data comprises real-time load data, historical load data, environmental parameter data and user behavior data, mapping the multi-modal data into intervals [0,1] by adopting a minimum-maximum normalization method to obtain normalized multi-modal data, smoothing abnormal values exceeding 3 times of standard deviation by adopting a moving average method based on the normalized multi-modal data to obtain smoothed multi-modal data, supplementing missing values in a data sequence by adopting a piecewise linear interpolation method based on the smoothed multi-modal data to obtain multi-modal data after supplementing the missing values, and recombining the multi-modal data after supplementing the missing values according to data type dimension, time sequence dimension and sampling frequency dimension to obtain preprocessed data tensor with dimension of m multiplied by n multiplied by k, wherein m is positive integer representing data type number, n is positive integer representing time sequence length, and k is positive integer representing sampling frequency;
s12, respectively calculating projection matrixes along a data type dimension, a time sequence dimension and a sampling frequency dimension based on the preprocessed data tensor, carrying out singular value decomposition on each projection matrix, selecting singular values and corresponding feature vectors thereof according to the principle that the accumulated contribution rate is more than 95 percent, respectively combining the selected feature vectors to form a data type dimension factor matrix, a time sequence dimension factor matrix and a sampling frequency dimension factor matrix;
S13, respectively calculating projection values of the feature core tensor in the directions of a data type dimension factor matrix, a time sequence dimension factor matrix and a sampling frequency dimension factor matrix, carrying out weighted summation on the projection values according to the corresponding singular values to obtain a weighted summation result, selecting the first p feature components with the accumulated contribution rate reaching 90% from the weighted summation result by adopting a principal component analysis method, and combining the p feature components to form a scene feature vector, wherein p is a positive integer;
S14, inputting scene feature vectors into a pre-trained support vector machine classifier to classify the scene to obtain scene type identifiers at the current moment, inputting the scene feature vectors at the latest N moments into a long-period memory network in time sequence to predict scene evolution trend at the next M moments to obtain scene prediction sequences containing M predicted scene types, wherein N and M are positive integers.
According to the embodiment, by constructing a three-dimensional tensor decomposition and deep learning hybrid architecture, accurate identification and prediction of a demand response scene are realized. The method comprises the steps of adopting a dual mapping mechanism of combining minimum-maximum normalization with an anti-hyperbolic tangent function, enabling load data, environment parameters and user behavior data of different dimensions to be represented in a unified feature space, simultaneously keeping nonlinear features of the data, carrying out multi-scale analysis on the data by introducing an adaptive wavelet decomposition network, carrying out probability estimation on missing data by combining a Markov chain Monte Carlo method, improving data quality, enabling the data missing rate in a real electric power system to be reduced to be lower than 3% from 15% in the original state, particularly introducing a time-frequency-phase three-dimensional feature extraction mechanism in the decomposition process, projecting original data into a plurality of feature subspaces through tensor product operation, dynamically adjusting feature weights through an adaptive feature selection model, and enabling the extraction precision of scene features to be improved by 25% and the accuracy of scene classification to be higher than 95%. In the actual demand response project, the embodiment can complete scene recognition in millisecond level, and provides a reliable scene feature basis for the formulation of a subsequent control strategy.
According to one aspect of the present application, step S11 is further:
S111, acquiring real-time load data, historical load data, environment parameter data and user behavior data from a data acquisition system, calculating the mean value and standard deviation of each type of data, and then normalizing to obtain normalized data;
S112, constructing a multi-layer wavelet decomposition network based on a standardized data set, and carrying out frequency layering on the data sequence to obtain a data sequence after frequency layering; based on the data sequence after frequency layering, self-adaptively adjusting threshold parameters in each layer of decomposition, separating high-frequency noise characteristics from data characteristics to obtain denoised data characteristics;
S113, constructing a conditional probability model of a data time sequence based on the denoising data set, and calculating a characteristic probability distribution according to the time dependency relationship of the data;
S114, extracting three-dimensional features of time-frequency-phase through tensor decomposition based on the complete data set, and reconstructing the three-dimensional features according to three dimensions of data type, time sequence and sampling frequency to form a preprocessed data tensor.
In one embodiment of the application, the normalization method is that X_norm (i, j) = (X (i, j) -mu_j)/sigma_j+tanh (X (i, j)), wherein X (i, j) is an original data matrix, i is a sample index, j is a feature index, mu_j is a mean value of the j features, sigma_j is a standard deviation of the j features, tanh is a hyperbolic tangent function, and X_norm (i, j) is a normalized data matrix.
The wavelet denoising method comprises the steps of W (a, b) = Σ (X (t) & ψ ((t-b)/a)) dt+alpha ·Σ|W (a, b) |, wherein W (a, b) is a wavelet transformation coefficient, X (t) is an input signal, ψ is a wavelet basis function, a is a scale parameter, b is a translation parameter, alpha is an adaptive threshold coefficient, alpha = log (N) sigma/sqrt (2·log (N)), N is a data length, sigma is a noise standard deviation, and sigma = mean (|W|)/0.6745 is estimated through an MAD method.
The data complement method comprises the steps of P (X_obss|X_obs) = pi (x_i|x_pa (i)). Exp (-lambda·|X-X_prev| 2), wherein X_obss are missing data, X_obss are observation data, x_i are ith variables, x_pa (i) are father node sets of x_i, lambda is a smoothing factor, X_prev is complete data of the previous moment, pi is a product operator, and the missing values are updated through iterative sampling until convergence.
Zhang Liangchong the construction method comprises the steps of T=X× 1 U1 ×2 U2 ×3 U3 + β·||▽T||1, wherein T is a reconstructed three-dimensional tensor, X is an original data tensor, U 1、U2、U3 is a three-dimensional basis matrix, X < i > represents a tensor product of an ith mode, beta is a regularization parameter, T is a gradient operator of the tensor, and I 1 are L1 norms. The specific extraction formula of the three-dimensional feature is F_time (t) = Σw_k.x (t-k) ·exp (- λ.k), F_freq (ω) = |FFT (x (t))|·H (ω), and F_phase (Φ) = arg (Hilbert (x (t))). The specific process of tensor decomposition is that the original tensor is converted into a standardized tensor, the standardized tensor is converted into a projection tensor, the projection tensor is converted into a decomposition tensor, and finally the decomposition tensor is converted into a core tensor.
According to the embodiment, the characteristic unified processing of the multi-source heterogeneous data is realized through the data preprocessing methods such as standardization, denoising, time-space reconstruction and the like. The method comprises the steps of mapping multi-source data such as power load and environmental parameters to a unified interval, eliminating the difference of different dimension data, carrying out multi-scale analysis on the data through a wavelet decomposition network, adaptively adjusting threshold parameters, effectively identifying and filtering sudden fluctuation and sensor noise in the load data, then adopting a Markov chain Monte Carlo method to treat the problem of data loss caused by communication faults, offline equipment and the like, improving the integrity of the data, and finally recombining the processed data into preprocessed data tensors according to the three dimensions of data types, time sequences and sampling frequencies. The embodiment reduces the noise level of the original data by 85 percent, improves the data integrity to 98 percent, and provides a high-quality data basis for the subsequent scene feature extraction.
According to one aspect of the present application, step S12 is further:
S121, constructing an adaptive projection network based on the preprocessed data tensor, and respectively calculating the feature space of the data type dimension, the time sequence dimension and the sampling frequency dimension;
S122, based on the projection feature matrix, constructing a block diagonalization matrix to reduce the correlation among features;
S123, constructing a self-adaptive feature selection model based on a feature component matrix, and calculating the accumulated contribution rate of each feature, dynamically adjusting a feature selection threshold based on the accumulated contribution rate to obtain an adjusted feature selection threshold, selecting a most representative feature vector based on the adjusted feature selection threshold, and combining to form a data type dimension factor matrix, a time sequence dimension factor matrix and a sampling frequency dimension factor matrix;
s124, constructing a multi-mode tensor projection network based on the data type dimension factor matrix, the time sequence dimension factor matrix and the sampling frequency dimension factor matrix, and projecting the preprocessed data tensor into a feature space formed by the factor matrix through tensor product operation based on the multi-mode tensor projection network to generate a feature core tensor.
In one embodiment of the application, the adaptive projection algorithm is that P (k) =argmin|X (k) -P (k) ·X (k) | 2 + γ·tr(P(k)·L·P(k)T, wherein P (k) is a projection matrix of the kth dimension, X (k) is a matrix expansion of the kth dimension, L is a Laplacian matrix, L=D-W, W is a similarity matrix, D is a degree matrix, gamma is a regularization coefficient, and tr (·) is a trace of the matrix.
The block diagonalization algorithm comprises Y=U.Sigma.V T + λ·∑||Y_i - B_i·D_i·B_iT||2, wherein Y is a projection feature matrix, U, V is a left singular vector matrix and a right singular vector matrix, sigma is a singular value matrix, Y_i is an ith block, B_i is a block base matrix, D_i is a diagonal matrix, lambda is a weighing coefficient, and B_i and D_i are solved through alternate optimization.
The adaptive feature selection algorithm comprises w (i) =exp (- |f_i| 2/theta (t)). (ρ_i/Σρ_j), wherein w (i) is the weight of the ith feature, f_i is a feature vector, theta (t) is an adaptive temperature parameter, theta (t) =theta_0.exp (-t/tau), ρ_i is a feature contribution rate, t is the iteration number, and tau is a cooling coefficient.
The multi-mode tensor projection algorithm :T_core = X ×1 (U1·M1) ×2 (U2·M2) ×3 (U3·M3);, wherein t_core is a feature core tensor, X is an original tensor, u_k is a factor matrix of a kth dimension, m_k is a modulation matrix of the kth dimension, m_k=diag (s_k), s_k is an adaptive scale vector, and the gradient descent optimization is performed.
Projection fusion is performed, wherein F is a scene feature vector, P_k is a kth projection direction, T_core (k) is a kth mode expansion of a core tensor, alpha_k is an adaptive weight, alpha_k=softmax (e_k/tau), e_k is a feature entropy of the kth direction, tau is a temperature parameter, F_prev is a feature vector of the last moment, and beta is a time sequence smoothing coefficient.
According to the method, the device and the system, the efficient conversion from the preprocessing data tensor to the characteristic core tensor is realized through self-adaptive projection and sparse singular value decomposition. In a demand response scene, the potential association among different types of data is captured by constructing a self-adaptive projection network to respectively calculate the feature space of three dimensions of data type, time sequence and sampling frequency, the correlation among features is reduced by a block diagonalization matrix, the most representative feature component is extracted, a self-adaptive feature selection model is utilized to dynamically adjust a feature selection threshold according to the accumulated contribution rate, the representativeness of the selected features is ensured, and finally an original tensor is projected to the feature space through a multi-modal tensor projection network to obtain a feature core tensor capable of accurately representing the current demand response scene. According to the embodiment, the feature dimension is reduced by 80%, and meanwhile, the information quantity of more than 95% is maintained, so that the efficiency of the subsequent scene classification is improved.
According to one aspect of the present application, step S14 is further:
s141, constructing a multi-head attention network based on scene feature vectors, and extracting space-time dependency relations among features;
S142, constructing a layered bidirectional neural network based on the feature dependency matrix, and encoding and decoding feature sequences with different time scales to form a time sequence encoding vector;
s143, constructing a Gaussian mixture process model based on a time sequence coding vector, and calculating probability distribution of scene prediction, wherein M is a positive integer based on the probability distribution of the scene prediction, and outputting a scene prediction sequence containing M prediction moments;
s144, calculating the credibility of the prediction result through a Bayesian inference network based on the scene prediction sequence, and generating a scene type identifier and a corresponding confidence score.
In one embodiment of the application, the multi-head attention algorithm is H (i) =softmax (Q_i.K_i T/sqrt(d))·V_i + ω·∑exp(-||t_i - t_j||2/σ2). H_j, where H (i) is the output of the ith head, Q_i, K_i, V_i are query, key, value matrix, d is the feature dimension, t_i is the timestamp, ω is the time decay coefficient, and σ is the Gaussian kernel width.
The bidirectional coding algorithm comprises E (t) = BiLSTM (F (t)) +alpha.CNN (F (t))+ (1-alpha). Transformer (F (t)), wherein E (t) is a time sequence coding vector, F (t) is an input characteristic sequence, biLSTM is a bidirectional long-short-term memory network, CNN is a convolutional neural network, transformer is a converter network, and alpha is an adaptive mixing coefficient.
The Gaussian mixture prediction algorithm is that p (y|x) = Σpi_k.N (mu_k (x), Σ_k (x)), wherein y is a predicted value, x is an input characteristic, pi_k is the weight of a kth component, mu_k (x) and Σ_k (x) are the conditional mean and covariance, the parameters are parameterized through a neural network, and the weight pi_k is generated through a SoftFlow network.
The Bayesian inference algorithm comprises p (c|F) =p (f|c) ·p (c)/Σp (f|c ') ·p (c') ·exp (-lambda·D_KL (q|p)), wherein c is a scene type, F is a feature vector, p (c) is a priori probability, p (f|c) is a likelihood function, q is a variational posterior distribution, D_KL is KL divergence, and lambda is a regularization coefficient.
According to the method, accurate identification and prediction of the demand response scene are achieved through the multi-head attention network and the Gaussian mixture process model. In the scene feature processing process, the characteristic dependency relationship of different time scales is captured by utilizing a multi-head attention mechanism, typical scene features such as high temperature, holidays and the like are accurately identified, a scene evolution rule is effectively extracted by encoding and decoding feature sequences of different time scales through a layered bidirectional neural network, probability distribution of scene prediction is calculated by utilizing a Gaussian mixture process model, future scene change trend is accurately predicted, and finally the credibility of a prediction result is estimated through a Bayesian inference network. According to the embodiment, the scene recognition accuracy is improved to 94%, the scene prediction accuracy reaches 88%, and a reliable scene prediction basis is provided for the establishment of a control strategy.
As shown in fig. 3, according to an aspect of the present application, step S2 is further:
S21, extracting the load change characteristic of each shunt by adopting a sliding window method based on real-time load data, historical load data and prestored system constraint data, calculating the adjustment upper limit and lower limit ranges of each shunt by adopting a data statistics method based on the load change characteristic of each shunt, determining the available state marks of the shunt by adopting a threshold value judgment method based on the adjustment upper limit and lower limit ranges of each shunt, and forming a shunt state vector based on the load change characteristic of each shunt, the adjustment upper limit and lower limit ranges of each shunt and the available state marks, wherein the dimension of the shunt state vector is q multiplied by 1, and q is a positive integer to represent the number of the shunts;
s22, calculating membership values of each shunt on r evaluation indexes such as power adjustment capacity, response speed and reliability evaluation by adopting a fuzzy membership function based on shunt state vectors, scene prediction sequences and scene type identifiers, wherein r is a positive integer;
S23, constructing an adjacency matrix of the graph neural network based on the shunt response capacity matrix and the historical load data, taking each shunt as a node in the graph, and calculating the association strength between the shunts by adopting a Pearson correlation coefficient as the weight of an edge;
s24, based on system constraint data, extracting S types of constraint conditions including power grid operation constraint, equipment operation constraint, user constraint and the like, wherein S is a positive integer, calculating quantitative indexes of the constraint conditions based on a shunt correlation matrix, and combining the quantitative indexes of all the constraint conditions with a preset threshold value to form a control constraint matrix.
In one embodiment of the application, the constraint data is specifically defined as system constraint data = { power grid operation constraint { voltage limit, power limit, line capacity }, device operation constraint { start-stop time, ramp rate, operation duration }, user comfort constraint { temperature range, response time, adjustment amplitude }.
The embodiment combines fuzzy comprehensive evaluation and a graph neural network to establish an accurate evaluation system of shunt response capability. The method comprises the steps of constructing a dynamic fuzzy membership function, mapping multidimensional indexes such as shunt regulation capacity, response speed and reliability to a unified evaluation space, realizing self-adaptive adjustment of evaluation index weights through fusion of an entropy weight method and an analytic hierarchy process, so that the accuracy of shunt evaluation is improved to 92%, and particularly in shunt association analysis, adopting a graph attention network, accurately capturing complex association relations among the shunts through dynamic update of node characteristics and a message transmission mechanism, so that potential influence chains which are difficult to find by a traditional method can be identified in actual projects, and the identification accuracy of the association relations reaches 88%. Meanwhile, by introducing a graph embedding algorithm to project high-dimensional features into a low-dimensional space, the computational complexity is reduced, so that evaluation can be completed in the second level even in a large power system comprising thousands of branches. The shunt response capability assessment system established by the embodiment can effectively guide the optimization of the control strategy, so that the execution success rate of the demand response is improved by more than 30%.
According to one aspect of the present application, step S22 is further:
S221, constructing a multi-layer evaluation index system based on the shunt state vector, the scene prediction sequence and the scene type identifier, and calculating quantization indexes of each shunt in the adjustment capability, the response speed and the reliability dimension to form an evaluation index matrix;
S222, constructing a dynamic fuzzy membership function based on the evaluation index matrix, and realizing the mapping from the index to the fuzzy set through the self-adaptive parameter adjustment to generate a fuzzy evaluation matrix;
s223, acquiring a fuzzy evaluation matrix, constructing an entropy weight-based analytic hierarchy process model, dynamically calculating weight coefficients of evaluation indexes, and outputting weight vectors;
S224, calculating the comprehensive score of each shunt by an improved fuzzy comprehensive evaluation algorithm based on the fuzzy evaluation matrix and the weight vector to form a shunt response capacity matrix.
In one embodiment of the application, the multidimensional evaluation index is calculated by I (I, j) =ζ.R (I, j) + (1- ζ) ·H (I, j) ·exp (- μ.t), wherein I (I, j) is the score of the shunt I on the index j, R (I, j) is the real-time response capability index, H (I, j) is the history representation index, ζ is the balance coefficient, t is the time decay parameter, μ is the forgetting factor, and H (I, j) is updated by sliding the time window.
The dynamic fuzzy membership function is that mu (x) =1/(1+ ((x-c)/a) 2b) +eta.Xi mu/Xi t, wherein mu (x) is a membership value, x is an input value, c is a center point, a and b are shape parameters, eta is a dynamic adjustment coefficient, xi mu/Xi t is a membership change rate, and the parameters are optimized through gradient descent.
The entropy weight analytic hierarchy process comprises the steps of w (j) = (1-H (j)). V (j)/(1-H (k)). V (k)), wherein w (j) is the comprehensive weight of a j-th index, H (j) is information entropy, H (j) = -sigma p_ij.ln (p_ij), v (j) is the weight obtained by the analytic hierarchy process, p_ij is the normalized evaluation value, and the specific calculation process of the analytic hierarchy weight v (j) comprises the steps of v (j) = (IIa_ij) 1/n/ ∑(∏a_kj)1/n.
The improved fuzzy comprehensive evaluation algorithm comprises S=R @ W+beta- (R @ W) 2, wherein S is a comprehensive scoring matrix, R is a fuzzy evaluation matrix, W is a weight vector, a fuzzy complex operator, beta is a nonlinear adjustment coefficient, and a quadratic term is used for capturing interaction among indexes.
According to the embodiment, through the dynamic fuzzy evaluation and entropy weight hierarchical analysis method, the accurate quantitative evaluation of the shunt response capability is realized. In a demand response scene, a multi-layer evaluation index system is firstly established, performance indexes of branches in the dimensions of adjustment capability, response speed, reliability and the like are quantized, mapping from indexes to fuzzy sets is realized by constructing a dynamic fuzzy membership function and self-adapting adjustment parameters, uncertainty of the branch response capability is accurately described, a hierarchical analysis model based on entropy weight is utilized to dynamically calculate weight coefficients of all evaluation indexes, an evaluation result is enabled to better meet the requirements of an actual demand response scene, and finally comprehensive scores of all branches are calculated through an improved fuzzy comprehensive evaluation algorithm to form a branch response capability matrix. The embodiment enables the accuracy rate of the shunt response capability evaluation to reach 92%, and provides a reliable shunt selection basis for the optimization of a subsequent control strategy.
According to one aspect of the present application, step S23 is further:
S231, reading the shunt response capacity matrix and the historical load data, constructing a shunt state sequence of a dynamic time window, extracting space-time characteristic relations among shunts through a deep neural network, and generating a characteristic association matrix.
S232, reading the characteristic association matrix, establishing a multi-layer diagram attention network, calculating the association strength among the branch nodes, and outputting the association strength matrix.
S233, reading the correlation intensity matrix, constructing a graph convolution neural network, updating node characteristics through a message transmission mechanism, and generating a node characteristic matrix.
S234, reading a node characteristic matrix, and projecting high-dimensional characteristics into a low-dimensional space through a graph embedding algorithm to form a shunt correlation matrix with the dimension of q multiplied by q, wherein q is a positive integer.
In one embodiment of the application, the spatio-temporal feature extraction algorithm is F (i, t) =LSTM (X (i, t)) +phi-FFT (X (i, t)) + (1-phi) -DWT (X (i, t)), wherein F (i, t) is the feature of the shunt i at time t, X (i, t) is the original load sequence, LSTM is the long and short term memory network, FFT is the fast Fourier transform, DWT is the discrete wavelet transform, and phi is the feature fusion coefficient.
The graph attention calculation is that A (i, j) = LeakyReLU (W T [ h_i|h_j ]) exp (-d_ij/sigma), wherein A (i, j) is the attention coefficient of the nodes i to j, W is a parameter matrix, h_i, h_j are node characteristics, d_ij is node distance, sigma is a distance scale parameter, and|represents characteristic splicing operation.
The message transmission mechanism is h '_i=sigma (sigma (A_ij.W.h_j)/sqrt (d_i.d_j) +lambda.M_i), wherein h' _i is updated node characteristics, A_ij is attention coefficient, W is transformation matrix, d_i and d_j are node degrees, lambda is memory coefficient, and M_i is node memory vector.
The graph embedding algorithm comprises Z=GCN (H) +alpha-GAT (H) +psi ·|Z·Z T - S||2, wherein Z is an embedding matrix, H is a node characteristic matrix, GCN is a graph convolution network, GAT is a graph annotation force network, S is a similarity matrix, alpha is a network fusion coefficient, and psi is a structure retention coefficient.
According to the embodiment, through the combination of the graph attention network and the graph convolution neural network, accurate modeling of complex association relations among branches is achieved. In the process of demand response additional control, a dynamic time window is firstly utilized to extract a branching state sequence, a space-time characteristic relation among branches is captured through a deep neural network, a multi-layer graph attention network is adopted to calculate the association strength among branching nodes, chain reactions possibly caused by load transfer are accurately identified, node characteristics are updated through a message transmission mechanism of a graph convolution neural network, cascading effect among the branches is effectively modeled, and finally a graph embedding algorithm is utilized to project high-dimensional characteristics into a low-dimensional space to form a branching association matrix. The embodiment ensures that the recognition accuracy of the path association relation reaches 89%, effectively avoids the overshoot phenomenon caused by load transfer, and improves the system stability by 35%.
As shown in fig. 4, according to an aspect of the present application, step S3 is further:
S31, constructing n optimization objective functions including a load reduction objective function, a user comfort objective function, a device life objective function and the like based on scene type identification and a shunt response capacity matrix, wherein n is a positive integer;
S32, constructing a judgment matrix by adopting a hierarchical analysis method based on a scene prediction sequence and user behavior data, and calculating the relative importance of each optimization target in an optimization target function set;
And S33, based on the control constraint matrix and the shunt correlation matrix, extracting boundary values and threshold values of various constraint conditions, and constructing a mathematical expression of the constraint conditions, and based on the mathematical expression, carrying out normalization processing on the constraint conditions to obtain a constraint condition set containing m constraint condition expressions, wherein m is a positive integer.
According to the embodiment, through the multi-objective optimization architecture, dynamic balance of multiple objectives such as load reduction, user comfort level, equipment service life and the like is achieved. Specifically, a target decomposition network is adopted to decompose a complex optimization target into a plurality of quantifiable sub-targets, dimension reduction mapping of a target space is realized through a variation self-encoder, the optimization difficulty is reduced, in particular, a nonlinear coupling term is introduced in the target fusion process, and the interaction effect among targets is captured through a dynamic weight adjustment mechanism, so that the optimization result meets the actual requirements. In the processing of constraint conditions, a judgment matrix is constructed by adopting an analytic hierarchy process, the rationality of weight distribution is ensured by combining consistency test, and the constraint violation rate is reduced by 75%. According to the embodiment, on the premise that the comfort level of a user is not reduced, the load reduction amount is improved by 20%, the service life of equipment is prolonged by 15%, and the effect of multi-objective collaborative optimization is achieved.
According to one aspect of the present application, step S31 is further:
s311, constructing a multi-layer target decomposition network based on scene type identification and a shunt response capacity matrix, decomposing an overall optimization target into sub-targets comprising load adjustment, user comfort and equipment operation, and generating a sub-target function set;
S312, based on the sub-objective function set, realizing dimension reduction mapping of a target space through a variation self-encoder, and outputting a target feature matrix;
S313, constructing a target fusion network based on the target feature matrix, and realizing multi-target self-adaptive combination through a dynamic weight adjustment mechanism to form an optimized target function set.
In one embodiment of the application, the target decomposition algorithm is f_i=g_i (x) +Σh_ij (x) ·r_ij, wherein f_i is an ith sub-target function, g_i (x) is a main target item, h_ij (x) is an interaction item, r_ij is an association coefficient, x is a decision variable, and r_ij is determined by an adaptive weighting method.
The variance target map is q (z|x) =n (μ (x), σ 2 (x)). Exp (- β·kl (q||p)), where q (z|x) is the coding distribution, μ (x) and σ 2 (x) are the mean and variance networks, KL is the KL divergence, p is the prior distribution, β is the variance coefficient, and sampling is performed by the re-parameterization technique.
The target fusion network comprises F (x) = Σw_i (x)/(f_i (x) +gamma ·Σc_ij min (f_i (x), f_j (x)), wherein F (x) is a fusion target function, w_i (x) is an adaptive weight, f_i (x) is a sub-target function, c_ij is a target correlation coefficient, gamma is a coupling coefficient, and weight is optimized through gradient descent.
According to the method, accurate quantification and dynamic optimization of multiple targets of demand response are achieved through the multi-layer target decomposition and target fusion network. The method comprises the steps of firstly constructing a multi-layer target decomposition network, decomposing overall targets such as load adjustment, user comfort level and equipment operation into quantifiable sub-targets, realizing dimension reduction mapping of a target space by utilizing a variation self-encoder, reducing optimization complexity, and realizing self-adaptive combination of multiple targets through a target fusion network and a dynamic weight adjustment mechanism. In the actual demand response project, the embodiment increases the quantification accuracy of the optimization target to 90%, shortens the dynamic adjustment response time of the target weight to millisecond level, and enables the system to rapidly adjust the optimization strategy according to the real-time scene characteristics.
As shown in fig. 5, according to an aspect of the present application, step S4 is further:
S41, acquiring an optimized objective function set and a constraint condition set, and generating an initial population by adopting a genetic algorithm based on non-dominant ranking; performing constraint condition verification on each individual of the initial population, generating a new individual through crossover and mutation operation, and performing t-generation iterative optimization to obtain a candidate strategy set containing u feasible control strategies, wherein u is a positive integer, and t is a positive integer;
S42, scoring each candidate strategy by adopting a pareto optimal principle based on a candidate strategy set and a weight coefficient matrix, and calculating the comprehensive scores of the strategies on different targets;
S43, based on a strategy scoring vector and a strategy sorting sequence, extracting strategy selection experience from prestored historical execution data by adopting a reinforcement learning method, and based on the strategy selection experience, carrying out combined optimization on v strategies with the front sorting to obtain a control strategy matrix with the dimension of q multiplied by w, wherein v is a positive integer and is smaller than u, q is a positive integer and represents the branching number, and w is a positive integer and represents the control action type number;
S44, based on the control strategy matrix and the shunt incidence matrix, a dynamic programming algorithm is adopted to construct a state transition equation, based on the state transition equation, the minimum time interval between adjacent control actions is calculated, and the execution time point of each control action is optimized, so that an execution time sequence vector with the dimension of q multiplied by 1 is obtained.
The embodiment combines a non-dominant ordering genetic algorithm and a multi-agent reinforcement learning method, and realizes the global optimization and dynamic adjustment of the control strategy. The initial population is generated through a genetic algorithm based on non-dominant ranking, and the obtained candidate control strategy set is ensured to have higher feasibility and diversity through multi-generation iterative optimization. And scoring and sequencing the candidate strategies by adopting the pareto optimal principle, so as to ensure that the selected control strategy has better comprehensive performance on a plurality of optimization targets. And the strategy selection experience is extracted from the historical execution data through the reinforcement learning method, so that the strategy with the top ranking is further optimized, and the intelligence and the adaptability of the control strategy are improved. And a state transition equation is constructed by adopting a dynamic programming algorithm, the minimum time interval between adjacent control actions is calculated, the execution time point of each control action is optimized, and the execution of the control strategy is ensured to have high efficiency and accuracy. The finally obtained control strategy matrix and execution time sequence vector can effectively improve the response capability and stability of the system under multiple scenes, and ensure the reliability and safety of the system operation. The embodiment can complete policy optimization and adjustment in millisecond level, meets the real-time requirement of demand response, and simultaneously improves the execution success rate of the control policy to more than 95%.
According to one aspect of the present application, step S41 is further:
S411, reading the optimized objective function set and the constraint condition set, constructing a hybrid coded chromosome structure, and generating an initial population through a non-dominant sorting algorithm to form an initial strategy set.
S412, reading the initial strategy set, establishing a multi-layer constraint verification network, performing feasibility verification on individuals in the population, and outputting the feasible strategy set.
S413, reading a feasible strategy set, constructing a self-adaptive evolution operator, and carrying out population evolution by dynamically adjusting the crossover probability and the variation probability to generate an evolution strategy set.
S414, reading the evolution strategy set, and performing multi-objective optimization screening on strategies by utilizing the pareto dominant relationship, so as to form a candidate strategy set containing u feasible control strategies, wherein u is a positive integer.
In one embodiment of the application, the hybrid coding initialization algorithm is P (0) = { x_i|x_i=α·b_i+ (1- α) ·R_i }, where P (0) is the initial population, x_i is the ith individual, B_i is the binary coding portion, R_i is the real coding portion, α is the coding weight, α=sigmoid (ρ·t), ρ is the adaptation rate, and t is the evolutionary algebra.
The constraint verification algorithm is V (x) = Σw_k.max (0, g_k (x)) 2 + Σθ_j·|h_j (x) |+μ·exp (- ΣΔ_i·d_i (x)), where V (x) is a violation degree function, g_k (x) is an inequality constraint, h_j (x) is an equality constraint, d_i (x) is a distance measure, w_k, θ_j, Δ_i are adaptive weights, and μ is a penalty factor.
The adaptive evolution operator comprises p_c=p_c0- (1-exp (-eta- (f_max-f ')/f_max)), p_m=p_m0- (1+sin (pi.t/T)), wherein p_c is cross probability, p_m is mutation probability, p_c0 and p_m0 are basic probability, f_max is population maximum fitness, f' is larger fitness of crossing individuals, eta is an adjustment coefficient, T is current algebra, and T is maximum algebra.
The multi-objective screening algorithm comprises D (x) = sigma lambda_i.f_i (x) +beta sigma w_ij sigma |f_i (x) -f_j (x) |exp (-gamma-d_ij), wherein D (x) is a dominant metric, f_i (x) is an ith objective function, lambda_i is a target weight, w_ij is a weight between targets, d_ij is a target space distance, beta is an equalization coefficient, and gamma is a distance attenuation coefficient.
According to the embodiment, through a non-dominant ordering genetic algorithm of hybrid coding, the global optimization of a demand response control strategy is realized. In a demand response scene, an initial population is generated through a chromosome structure of mixed codes, different types of control decisions are effectively represented, a multi-layer constraint verification network is adopted to conduct feasibility verification on individuals in the population, the fact that the strategy meets system operation constraints is guaranteed, population evolution is conducted through dynamic adjustment of crossover probability and variation probability of a self-adaptive evolution operator, algorithm searching efficiency is improved, and finally multi-objective optimization screening is conducted on strategies by utilizing Parritor dominant relations, so that a candidate strategy set is formed. The embodiment improves the strategy generation efficiency by 65%, the feasibility of the strategy reaches 96%, meanwhile, the diversity of the strategy set is ensured, and a high-quality initial solution is provided for subsequent strategy optimization.
According to one aspect of the present application, step S42 is further:
S421, reading the candidate strategy set and the weight coefficient matrix, constructing a multi-criterion decision network, calculating the evaluation scores of strategies under different targets, and generating a strategy evaluation matrix.
S422, reading the strategy evaluation matrix, establishing a strategy ordering model based on the dominant relation, determining the layering level of the strategy through non-dominant ordering, and outputting a layering sequence matrix.
S423, reading the layered sequence matrix, and evaluating the diversity of the same-layer strategies by using a crowdedness computation model to form strategy scoring vectors and strategy sequencing sequences.
In one embodiment of the application, the multi-criterion evaluation algorithm is E (x) = Σv_k.C_k (x) +ω ΣΣρ_ij.min (C_i (x), C_j (x)) · exp (- τ t), wherein E (x) is the evaluation score, C_k (x) is the kth criterion function, v_k is the criterion weight, ρ_ij is the criterion correlation coefficient, ω is the co-coefficient, τ is the time decay rate, and t is the time interval.
The hierarchical ordering algorithm comprises the steps of R (i) = (1-epsilon) & N (i) +epsilon.S (i) +phi.exp (-phi.V (i)), wherein R (i) is an ordering value of an ith layer, N (i) is a non-dominant ordering value, S (i) is an intensity value, S (i) = |{ j|i > j } |/|P|, V (i) is a degree of default, epsilon is a weighing coefficient, phi is a feasibility weight, and phi is a penalty coefficient.
The crowdedness assessment algorithm is that C (i) = Σ (f_m u(i) - f_ml(i))/(f_mmax - f_mmin) +kappa-exp (-mu.d (i)), wherein C (i) is the crowdedness of individual i, f_m u(i) and f_m l(i) are the upper and lower neighbor values on target m, f_m max and f_m min are the extremum of target m, d (i) is the distance to nearest neighbor, kappa is the concentration coefficient, and mu is the distance influencing factor.
The embodiment realizes the accurate scoring and sequencing of the demand response control strategy through the multi-criterion decision network and the crowding degree calculation model. The method comprises the steps of firstly constructing a multi-criterion decision network, calculating evaluation scores of strategies under different targets, establishing a strategy ordering model based on a dominant relation, determining layering levels of the strategies through non-dominant ordering, and evaluating diversity of the strategies at the same layer by utilizing a crowdedness calculation model to form strategy scoring vectors and ordering sequences. According to the embodiment, the accuracy of strategy scoring is improved to 93%, the uniformity of strategy sequencing is improved by 40%, and the finally selected control strategy is ensured to meet the multi-objective optimization requirement and has good diversity.
According to one aspect of the present application, step S43 is further:
S431, reading a strategy grading vector and a strategy ordering sequence, constructing a deep reinforcement learning network, extracting strategy selection features from historical execution data, and generating a strategy feature vector.
S432, reading the strategy feature vectors, establishing a strategy evaluation model, calculating expected benefits of each strategy under different scenes, and outputting a strategy benefit matrix.
S433, reading a strategy profit matrix, and carrying out combination optimization on the strategy by a multi-agent reinforcement learning method to generate an optimized strategy set.
S434, reading an optimization strategy set, and extracting an optimal strategy combination by using a strategy distillation algorithm to form a control strategy matrix with the dimension of q multiplied by w, wherein q and w are positive integers.
In one embodiment of the application, the policy feature extraction algorithm is Q (s, a) = (1- ζ) & V(s) +ζ.A (s, a) +λ Σw_k.H_k (s, a), wherein Q (s, a) is a state action value function, V(s) is a state value function, A (s, a) is a dominance function, H_k (s, a) is a history feature function, ζ is a double architecture coefficient, w_k is feature weight, and λ is a history influence factor.
The strategy evaluation model comprises r (s, a) = Σpi-i q-i (s, a) +eta-KL (pi|pi_old) +v|Σc jk min (q_j (s, a), q_k (s, a)), wherein r (s, a) is a reward function, pi_i is strategy distribution, q_i (s, a) is a sub-strategy value function, KL is a strategy difference measure, c_jk is a strategy correlation coefficient, eta is a conservation degree coefficient, and v is a synergy coefficient.
The multi-agent optimization algorithm is characterized by L (theta) =E [ Σγ t.r_t ] +alpha ·ΣA_ij log (p_i (a_j|s))+beta.H (pi), wherein L (theta) is an objective function, theta is a policy parameter, gamma is a discount factor, r_t is a reward for time t, A_ij is an attention coefficient between agents, p_i (a_j|s) is a conditional action probability, H (pi) is a policy entropy, alpha is a cooperation coefficient, and beta is an exploration coefficient.
The strategy distillation algorithm is that L_d=KL (pi_s pi_t) +omega. F_s-f_t pi 2 +phi.CE (y_s, y_t), wherein L_d is distillation loss, pi_s and pi_t are student and teacher strategies, f_s and f_t are characteristic representations, y_s and y_t are prediction outputs, KL is KL divergence, CE is cross entropy, omega is characteristic matching weight, phi is output matching weight.
According to the embodiment, the dynamic optimization of the demand response control strategy is realized through the deep reinforcement learning and the multi-agent optimization method. The method comprises the steps of firstly extracting strategy selection features from historical execution data through a deep reinforcement learning network, establishing a strategy evaluation model to calculate expected benefits of each strategy under different scenes, carrying out combination optimization on the strategies through a multi-agent reinforcement learning method, and finally extracting optimal strategy combinations through a strategy distillation algorithm to form a control strategy matrix. In practical application, the convergence rate of policy optimization is improved by 55%, the success rate of policy execution reaches 94%, and the accuracy and reliability of demand response control are effectively improved.
According to one aspect of the present application, step S44 is further:
S441, reading the control strategy matrix and the shunt correlation matrix, constructing a time sequence dependency graph network, identifying a precursor successor relationship among control actions, and generating a dependency relationship matrix.
S442, reading the dependency relation matrix, establishing a conflict detection model, and identifying time sequence conflicts through a constraint propagation algorithm to form a conflict constraint set.
S443, combining the conflict constraint set, optimizing the execution sequence of the control action by using a heuristic search algorithm, and generating an execution time sequence vector with the dimension of q multiplied by 1, wherein q is a positive integer.
In one embodiment of the application, the timing dependency graph construction algorithm is D (i, j) = alpha.T (i, j) + (1-alpha) ·C (i, j) +beta·exp (-delta deltat_ij), wherein D (i, j) is the dependency strength of the nodes i to j, T (i, j) is the timing dependency, T (i, j) = cos (v_i, v_j), C (i, j) is the control dependency, C (i, j) = |K_i n K_j|/|K_i U K_j|, v_i, v_j are the control vectors, K_i, K_j are the influence sets, deltat_ij is the time interval, alpha is the balance coefficient, beta is the time weight, and delta is the attenuation rate.
The collision detection algorithm comprises P (c|i, j) =sigma (W| [ h_i|h_j ] +b) ·exp (-mu||t_i-t_j||) and lambda sigma w_k.F_k (i, j), wherein P (c|i, j) is the collision probability between actions i, j, h_i, h_j is the action feature vector, W is the weight matrix, b is the bias vector, sigma is the sigmoid function, t_i, t_j is the execution time point, F_k (i, j) is the kth collision feature function, w_k is the feature weight, mu is the time influence coefficient, and lambda is the feature adjustment coefficient.
Heuristic search algorithm S (x) =g (x) +h (x) +ω Σmax (0, v_k (x)) 2, where S (x) is the search scoring function, g (x) is the executed path cost, g (x) = Σc_ij·y_ij, h (x) is the heuristic estimate, h (x) = Σθ_i· (t_max-t_i), v_k (x) is the kth violation metric, c_ij is the transition cost, y_ij is the path indicating variable, t_max is the maximum completion time, t_i is the current execution time, θ_i is the time weight, ω is the penalty coefficient.
In another embodiment of the application, dynamic scheduling correction is performed, wherein R (t) = Ση_i.e_i (t) +γ ΣΣρ_ij.min (e_i (t), e_j (t)) · exp (-phi·Δt), wherein R (t) is the scheduling correction value of time t, e_i (t) is the ith execution bias, η_i is the bias weight, ρ_ij is the task correlation coefficient, Δt is the time difference, γ is the co-coefficient, and phi is the time decay rate.
The method comprises the steps of performing sequence optimization, namely L (pi) =E [ Σr_t.gamma t ] -beta.KL (pi|pi_ref) +lambda.H (pi), wherein L (pi) is an optimization target, pi is an execution strategy, r_t is a reward of time t, gamma is a discount factor, pi_ref is a reference strategy, H (pi) is a strategy entropy, beta is a conservation coefficient, lambda is an exploration coefficient, and optimizing parameters through a strategy gradient method.
Constraint propagation is performed by C (x_t) = Σw_i·g_i (x_t) +μ·Σh_j (x_t) +v·Σc·kl f_k (x_t) ·f_l (x_t), wherein C (x_t) is a constraint function at time t, g_i (x_t) is a hard constraint term, h_j (x_t) is a soft constraint term, f_k (x_t) is a feature function, w_i is a constraint weight, c_kl is a feature interaction coefficient, μ is a soft constraint coefficient, and v is an interaction weight.
According to the embodiment, the optimal scheduling of the demand response control action is realized through the time sequence dependency graph and the heuristic search algorithm. In the execution process, a time sequence dependency graph network is firstly constructed, a precursor subsequent relation among control actions is identified, a conflict detection model is established, time sequence conflicts are identified through a constraint propagation algorithm, the execution sequence of the control actions is optimized through a heuristic search algorithm, and an execution time sequence vector is generated. In practical application, the embodiment reduces the time sequence conflict rate of the control action by 82%, improves the execution efficiency by 58%, ensures the stability of the control sequence, and effectively avoids system fluctuation caused by improper execution sequence.
In one embodiment of the present application, a method for adding control branches in multiple scenarios during execution of demand response includes the following steps:
And step 1, when the demand response needs to be executed, creating a demand response event.
And 2, selecting whether to participate in the current demand response event by the user, if not, directly ending, and if so, performing the next step.
And 3, judging whether the terminal is in a power-on state, if the terminal is in a power-on state, directly ending the power-on, and if the terminal is in a power-off state, performing the next step.
And step 4, checking whether the user edits the execution strategy or not at the current event, if the user edits the execution strategy manually, strictly installing the execution strategy edited by the user, and if the execution strategy is not edited at the current event, performing the next step of judgment.
And 5, judging whether a user edits a default scheme, wherein the default scheme is generally available, and initializing three default schemes for the user by the system, wherein the three default schemes comprise a temperature adjustment scheme, a load adjustment ratio scheme and a soft shutdown scheme. The default solution allows the user to configure themselves, including creating a default solution and removing the solution from the default solution. If the default scheme exists, the third-party system is synchronized to carry out the scheme selection after responding to the capability, otherwise, the case without the default scheme is started to carry out the selection shunt.
And 6, under the condition of default schemes, counting the response capacity of each default scheme, arranging the response capacity from small to large, comparing the scheme response capacity with a load gap at the running time, selecting a default scheme with the first larger than the load gap, and if the response capacity of all schemes is smaller than the current time gap, selecting the default scheme with the largest response capacity. In addition, because the climbing time of the temperature regulating scheme is long, the temperature regulating scheme can be used only before an event starts, and the default scheme is selected in the execution period, so that the temperature regulating scheme cannot be included.
And 7, selecting branches from the selected schemes in sequence and sending control commands, wherein if the scheme is a wheel-stop scheme, one-wheel control can only select one multi-split line, and the scheme is not limited. In addition, the selection of the branches is performed according to the sequence of the branches in the scheme, otherwise, the event can not be selected any more if the controlled branches are used, and the execution of the branches can not be selected from the controlled branches unless the available branch response capacity is smaller than the load gap at the moment.
And 8, if the event is ended, the control is ended, otherwise, the next control time is waited for to control again.
Step 9, if the default scheme is not available in step 5, the step is entered. All branches in the system are ranked from large to small according to response capacity, the branch with the largest response capacity is selected firstly, whether the load gap at the moment is met is judged, if yes, a new branch is not added, otherwise, the branch with the smallest response capacity is added, whether the load gap at the moment is met is judged again, if yes, a new branch is not added, otherwise, the branch with the largest response capacity is added from the rest branches, judgment is carried out, and the cycle is carried out until the load gap at the moment is met or all branches are added and then stopped. After determining that the additional shunt is added, a control command is sent to control, whether an event is ended or not is judged, if so, the control is stopped, and if not, the next control time is up to enter the step again to control.
According to the embodiment, through the automatic control algorithm flexibly adapting to the user-defined strategy, the full honour and efficient realization of the personalized requirements of the user are ensured. The method has high compatibility, can seamlessly access the user-defined strategy template, and is operated by adopting the strategy edited by the user preferentially. Under the condition that the user actively participates in policy customization, the system can accurately execute user intention, and the demand response activity is ensured to be more fit with the actual demand of the user. When the user does not edit the strategy, the preset intelligent execution mode can be automatically started, so that the response efficiency is improved, the decision burden of the user in the demand response process is reduced, and the execution effect of the demand response is improved. Not only the participation and control feeling of the user are enhanced, but also the intelligent level of the system is improved. The system defaults to integrate three high-efficiency strategies, namely a temperature regulation strategy, a load regulation ratio strategy and a soft shutdown strategy. The strategies can be seamlessly interfaced with third party systems (such as air conditioning control systems, intelligent lighting systems and the like), so that the devices can actively participate in demand response, and energy consumption optimization is achieved. The user can also customize the editing strategy scheme according to the own electricity utilization characteristics and requirements. Through a simple operation, the user can set the edited scheme as a default execution scheme. When the demand response event is triggered, the system automatically adopts the customized schemes to operate, so that the demand response activity is ensured to be in line with the habit of the user and can be efficiently executed. The direct automatic execution function without manual intervention brings great convenience to the user. Even if the user does not edit any execution strategy and does not set a default scheme, the system can still operate independently to automatically execute the demand response task. The design of the function fully considers the time value and the operation convenience of the user, so that the user can complete the whole process of demand response under the condition of no manual operation. The full-automatic execution mode not only reduces the workload of users and improves the satisfaction of the users, but also ensures the efficient and stable execution of the demand response activities, and provides powerful guarantee for the safe operation of the power system.
In the invention, a complete multi-scene additional control shunt decision system is constructed by organically combining tensor decomposition, graph neural network, reinforcement learning and other methods in the demand response execution process. Firstly, extracting scene characteristics from multi-source heterogeneous data through a tensor decomposition method, accurately identifying different demand response scenes such as high temperature, holidays, peak staggering electricity consumption and the like, secondly, analyzing association relations among branches through a graph neural network, accurately evaluating response capacities of the branches under different scenes, effectively avoiding chain reactions caused by load transfer, thirdly, balancing a plurality of targets such as load reduction, user comfort and the like through a multi-target optimization method, dynamically adjusting optimization weights according to real-time scene characteristics, and finally, generating a control strategy sequence through a genetic algorithm, and ensuring rationality of an execution sequence through a time sequence dependency graph. Practical application shows that the invention improves the average load reduction rate of the demand response project to 18%, the user comfort satisfaction reaches 92%, the control strategy execution success rate is improved to 95%, and the accuracy and the effectiveness of the demand response are improved.
The preferred embodiments of the present invention have been described in detail above, but the present invention is not limited to the specific details of the above embodiments, and various equivalent changes can be made to the technical solution of the present invention within the scope of the technical concept of the present invention, and all the equivalent changes belong to the protection scope of the present invention.