+

CN119539145A - A method and system for predicting carbon emissions in the power industry based on XGBoost algorithm - Google Patents

A method and system for predicting carbon emissions in the power industry based on XGBoost algorithm Download PDF

Info

Publication number
CN119539145A
CN119539145A CN202411492988.8A CN202411492988A CN119539145A CN 119539145 A CN119539145 A CN 119539145A CN 202411492988 A CN202411492988 A CN 202411492988A CN 119539145 A CN119539145 A CN 119539145A
Authority
CN
China
Prior art keywords
model
data
prediction model
value
objective function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411492988.8A
Other languages
Chinese (zh)
Inventor
于万水
刘超
张浩田
徐鹏
苗博
陈文静
迟永宁
易俊
温杰
马娜
袁秋洁
何飞
李淑珍
曲睿婷
王群
于亮亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Electric Power Research Institute Co Ltd CEPRI
Information and Telecommunication Branch of State Grid Liaoning Electric Power Co Ltd
State Grid Corp of China SGCC
Original Assignee
China Electric Power Research Institute Co Ltd CEPRI
Information and Telecommunication Branch of State Grid Liaoning Electric Power Co Ltd
State Grid Corp of China SGCC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Electric Power Research Institute Co Ltd CEPRI, Information and Telecommunication Branch of State Grid Liaoning Electric Power Co Ltd, State Grid Corp of China SGCC filed Critical China Electric Power Research Institute Co Ltd CEPRI
Priority to CN202411492988.8A priority Critical patent/CN119539145A/en
Publication of CN119539145A publication Critical patent/CN119539145A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Game Theory and Decision Science (AREA)
  • Software Systems (AREA)
  • Educational Administration (AREA)
  • Primary Health Care (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Water Supply & Treatment (AREA)
  • Public Health (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a power industry carbon emission prediction method and system based on XGBoost algorithm, and belongs to the technical field of carbon emission prediction. The method comprises the steps of obtaining data related to carbon emission in the power industry currently, inputting the data related to the carbon emission in the power industry currently into a prediction model, and solving the prediction model to predict the carbon emission of the power industry in a future period, wherein training samples of the prediction model are training sets and test sets generated based on historical data related to the carbon emission of the power industry history. According to the invention, XGBoost algorithm is adopted to construct a model for prediction, and the predicted carbon emission is more accurate.

Description

XGBoost algorithm-based carbon emission prediction method and system for power industry
Technical Field
The invention relates to the technical field of carbon emission prediction, in particular to a power industry carbon emission prediction method and system based on XGBoost algorithm.
Background
With the development of human society, the problems of shortage of traditional fossil energy, serious environmental pollution and the like are more prominent, and the 'carbon peak and carbon neutralization' have become main future paths for relieving energy crisis. Among them, the power industry is one of the main sources of energy consumption and carbon emission, and is responsible for important emission reduction tasks. Therefore, developing a high-precision carbon emission prediction method for the high-energy-consumption industry has become an important basis for formulating an emission reduction policy, evaluating an emission reduction effect and optimizing an energy structure.
Current application state of carbon emission prediction model:
The accuracy of the carbon emission prediction is mainly determined by the richness of the predicted object data set and the accuracy of the adopted prediction model. For the prediction object, since the data provided by each region of each province is relatively abundant, most of the existing carbon emission prediction researches are carried out for the large-scale regions, the carbon emission prediction researches for specific industries are relatively less, the power industry uses large energy to use electricity, the different time-space dimensions have large difference, and the carbon emission data is difficult to obtain. Regarding the prediction model adopted by the carbon emission prediction, the prediction model mainly comprises STIRPAT model, LEAP model, BP neural network model and the like, and the future carbon emission trend of the whole area is predicted by comprehensively analyzing economic activities, energy use, technical progress and policy influence, and the research current situation is summarized as follows:
The regional carbon emission prediction method based on STIRPAT model comprises the following steps:
The regional carbon emission prediction method based on STIRPAT model takes analysis of environmental impact as driving factor, and researches the influence of Population (Population), abundance (Affluence) and Technology (Technology) on the environment through a statistical method, so as to realize the prediction of large regional carbon emission. However, as the model introduces more variables and nonlinear relations exist among large multiple variables, the model has higher complexity and more complex calculation. Meanwhile, the model defaults to assume that all factors are independent, complex interaction among all variables is ignored, and carbon emission prediction accuracy is relatively low.
The regional carbon emission prediction method based on LEAP model comprises the following steps:
The LEAP model-based regional carbon emission prediction method comprehensively considers various factors through a system dynamics and integration modeling method, evaluates the long-term influence of the energy policy and provides scientific basis for decision making. However, the predictive outcome of the LEAP model is highly dependent on the input data, and insufficient data or low quality may affect its accuracy. Furthermore, future technological advances and uncertainty in policy changes may lead to predictive bias, and models have difficulty responding quickly to market changes or incidents.
The regional carbon emission prediction method based on BP neural network comprises the following steps:
The regional carbon emission prediction method based on the BP neural network is a prediction model which takes a back propagation algorithm of the BP neural network as a principle, minimizes an output error by adjusting weight and bias and continuously adjusts parameters to approach an objective function. The model can be very time consuming for large data sets and the training process of complex networks. Furthermore, the gradient descent method may be trapped in local minima, affecting the overall performance of the model. In addition, the super-parameter tuning process is complex, and a plurality of super-parameters need to be adjusted, especially in a multi-layer network, gradients may gradually disappear when counter-propagating, and thus the training effect is affected.
The existing problems are as follows:
1) The carbon emission prediction method is insufficient in spatial accuracy. Most of the existing carbon emission prediction methods estimate carbon emission for a large geographical area, and local factors such as city development, industrial structure and traffic pattern differences cannot be carefully considered, so that the rough spatial resolution may cause deviation of prediction results.
2) The carbon emission prediction method is short in time scale. Existing carbon emission prediction methods often focus on short-term time scales, and usually only predict emission trends in the next few years, and such short-term viewing angles limit comprehensive understanding of long-term climate change influences, so that long-term influences of factors such as economic growth, technical progress and policy change on emission cannot be effectively captured.
3) There are few carbon emission prediction methods for the power industry. Most of the existing carbon emission prediction methods predict large-area carbon emission from a macroscopic view, and are not designed specifically for specific power industries, so that rapid development of renewable energy sources and changes of power markets cannot be fully considered.
Disclosure of Invention
Aiming at the problems, the invention provides a power industry carbon emission prediction method based on XGBoost algorithm, which comprises the following steps:
acquiring current data related to carbon emission in the power industry;
Inputting current data related to carbon emission of the electric power industry into a prediction model, and solving the prediction model to predict the carbon emission of the electric power industry in a future period;
the training samples of the predictive model are training sets and test sets generated based on historical data of the power industry history related to carbon emissions.
Optionally, the historical data of the power industry history related to carbon emissions includes:
carbon emission data, economic activity-related data, energy usage data, and policy information data.
Optionally, the method further comprises:
generating a data set based on the historical data, and dividing the data set into a training set and a testing set according to a preset proportion;
The predictive model is trained based on the training set and the test set.
Optionally, generating a data set based on the historical data includes:
cleaning the historical data to filter abnormal values and complement missing values, so as to obtain cleaning data;
and screening the cleaning data to obtain key variable data influencing carbon emission, and carrying out normalization processing on the key variable data influencing carbon emission to generate a data set.
Optionally, training the predictive model based on the training set and the testing set includes:
Training XGBoost algorithm by using the training set to generate an initial prediction model, testing the performance of the initial prediction model by using the testing set, and adjusting model parameters of the prediction model according to a testing result to obtain the prediction model.
Optionally, training XGBoost the algorithm with the training set to generate an initial prediction model, including:
And selecting an average value of the carbon emission in the power industry as an initial predicted value of the carbon emission to define an initial model, performing iterative training on the initial model based on XGBoost algorithm to generate an iterative model, establishing an objective function based on the iterative model, and generating the initial predicted model based on the objective function.
Optionally, in the iterative training process, a decision tree is added in each iteration, and the latest predicted value is updated.
Optionally, the output of the iterative model is as follows:
Wherein, For the predicted value of carbon emission, t is the number of weak classifiers, and f k(xi) is the predicted value of the kth weak classifier on the sample xi.
Optionally, the objective function is as follows:
Wherein, Is an objective function, is a function to be optimized in a prediction model and aims at an objective function value of a bj node, y i is a model predicted value, y i t-1 is an actual value or a value at the previous moment, L is a loss function, which measures the difference between the model predicted value y i and the actual value or the value at the previous moment y i t-1, g i is a first derivative of the loss function, f t (x) is a predicted value of the model in the current t-th iteration, and h i is a second derivative of the loss function; Omega (f t) is a regularization term, in the decision tree model, omega (f t) enables the model to be simpler and more effective by adjusting the number and the weight of leaf nodes of the tree, and constant is a constant term.
Optionally, converting the objective function, and establishing an initial prediction model based on the converted objective function;
wherein the transformed objective function is as follows:
Wherein, T is the total dimension or the number of leaf nodes, represents the number of terms summed in the round of iteration, and is the upper summation bound in the formula; the square of the sum of the first derivatives of the loss function, H j the second derivative of the loss function, and lambda the regularization term can be used to avoid numerical instability when H j is too small.
Optionally, solving the prediction model comprises solving the prediction model by solving the Gain change of the information before and after node splitting so as to predict the carbon emission of the electric power industry in a future period;
The calculation formula of the information Gain change is as follows:
Wherein Gain is the information Gain; The method is characterized in that the method is an objective function which is to be optimized in a prediction model and aims at an objective function value of a bj node, G L is a gradient sum of a left sub-node and represents an accumulated value of gradients on a left sub-tree after a current node is split, G R is a gradient sum of a right sub-node and represents an accumulated value of gradients on a right sub-tree after the current node is split, H L is a Hessian sum of the left sub-node and represents a sum of second derivatives and is used for adjusting update step length, lambda is a regularization term and is used for adjusting regularization parameters in a tree growth process, the model is kept simple by punishing complexity of split nodes, overfitting is avoided, and H R is a Hessian sum of the right sub-node and represents a sum of the second derivatives and is used for curvature control in the update process.
In still another aspect, the present invention further provides a power industry carbon emission prediction system based on XGBoost algorithm, including:
the data acquisition unit is used for acquiring current data related to carbon emission in the power industry;
The solving unit is used for inputting current data related to carbon emission of the electric power industry into a prediction model and solving the prediction model so as to predict the carbon emission of the electric power industry in a future period;
the training samples of the predictive model are training sets and test sets generated based on historical data of the power industry history related to carbon emissions.
Optionally, the historical data of the power industry history related to carbon emissions includes:
carbon emission data, economic activity-related data, energy usage data, and policy information data.
Optionally, the solving unit is further configured to:
generating a data set based on the historical data, and dividing the data set into a training set and a testing set according to a preset proportion;
The predictive model is trained based on the training set and the test set.
Optionally, generating a data set based on the historical data includes:
cleaning the historical data to filter abnormal values and complement missing values, so as to obtain cleaning data;
and screening the cleaning data to obtain key variable data influencing carbon emission, and carrying out normalization processing on the key variable data influencing carbon emission to generate a data set.
Optionally, training the predictive model based on the training set and the testing set includes:
Training XGBoost algorithm by using the training set to generate an initial prediction model, testing the performance of the initial prediction model by using the testing set, and adjusting model parameters of the prediction model according to a testing result to obtain the prediction model.
Optionally, training XGBoost the algorithm with the training set to generate an initial prediction model, including:
And selecting an average value of the carbon emission in the power industry as an initial predicted value of the carbon emission to define an initial model, performing iterative training on the initial model based on XGBoost algorithm to generate an iterative model, establishing an objective function based on the iterative model, and generating the initial predicted model based on the objective function.
Optionally, in the iterative training process, a decision tree is added in each iteration, and the latest predicted value is updated.
Optionally, the output of the iterative model is as follows:
Wherein, For the predicted value of carbon emission, t is the number of weak classifiers, and f k(xi) is the predicted value of the kth weak classifier on the sample xi.
Optionally, the objective function is as follows:
Wherein, Is an objective function, is a function to be optimized in a prediction model and aims at an objective function value of a bj node, y i is a model predicted value, y i t-1 is an actual value or a value at the previous moment, L is a loss function, which measures the difference between the model predicted value y i and the actual value or the value at the previous moment y i t-1, g i is a first derivative of the loss function, f t (x) is a predicted value of the model in the current t-th iteration, and h i is a second derivative of the loss function; Omega (f t) is a regularization term, in the decision tree model, omega (f t) enables the model to be simpler and more effective by adjusting the number and the weight of leaf nodes of the tree, and constant is a constant term.
Optionally, converting the objective function, and establishing an initial prediction model based on the converted objective function;
wherein the transformed objective function is as follows:
Wherein, T is the total dimension or the number of leaf nodes, represents the number of terms summed in the round of iteration, and is the upper summation bound in the formula; the square of the sum of the first derivatives of the loss function, H j the second derivative of the loss function, and lambda the regularization term can be used to avoid numerical instability when H j is too small.
Optionally, the solving unit is used for solving the prediction model and comprises the steps of solving the prediction model by solving the information Gain change before and after node splitting so as to predict the carbon emission of the electric power industry in a future period;
The calculation formula of the information Gain change is as follows:
Wherein Gain is the information Gain; The method is characterized in that the method is an objective function which is to be optimized in a prediction model and aims at an objective function value of a bj node, G L is a gradient sum of a left sub-node and represents an accumulated value of gradients on a left sub-tree after a current node is split, G R is a gradient sum of a right sub-node and represents an accumulated value of gradients on a right sub-tree after the current node is split, H L is a Hessian sum of the left sub-node and represents a sum of second derivatives and is used for adjusting update step length, lambda is a regularization term and is used for adjusting regularization parameters in a tree growth process, the model is kept simple by punishing complexity of split nodes, overfitting is avoided, and H R is a Hessian sum of the right sub-node and represents a sum of the second derivatives and is used for curvature control in the update process.
In yet another aspect, the present invention also provides a computing device comprising one or more processors;
A processor for executing one or more programs;
the method as described above is implemented when the one or more programs are executed by the one or more processors.
In yet another aspect, the present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed, implements a method as described above.
Compared with the prior art, the invention has the beneficial effects that:
The invention provides a power industry carbon emission prediction method based on XGBoost algorithm, which comprises the steps of obtaining data related to current carbon emission of a power industry, inputting the data related to the current carbon emission of the power industry into a prediction model, and solving the prediction model to predict the carbon emission of the power industry in a future period, wherein training samples of the prediction model are training sets and testing sets generated based on historical data related to the historical carbon emission of the power industry. According to the invention, XGBoost algorithm is adopted to construct a model for prediction, and the predicted carbon emission is more accurate.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of a decision tree employing XGBoost algorithm in the method of the present invention;
FIG. 3 is a diagram showing an example of power industry input data required for the method of the present invention;
Fig. 4 is a block diagram of the system of the present invention.
Detailed Description
The exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, however, the present invention may be embodied in many different forms and is not limited to the examples described herein, which are provided to fully and completely disclose the present invention and fully convey the scope of the invention to those skilled in the art. The terminology used in the exemplary embodiments illustrated in the accompanying drawings is not intended to be limiting of the invention. In the drawings, like elements/components are referred to by like reference numerals.
Unless otherwise indicated, terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. In addition, it will be understood that terms defined in commonly used dictionaries should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense.
Example 1:
The invention provides a power industry carbon emission prediction method based on XGBoost algorithm, as shown in figure 1, comprising the following steps:
Step 1, acquiring current data related to carbon emission in the power industry;
Step 2, inputting current data related to carbon emission of the electric power industry into a prediction model, and solving the prediction model to predict the carbon emission of the electric power industry in a future period;
the training samples of the predictive model are training sets and test sets generated based on historical data of the power industry history related to carbon emissions.
Wherein, the historical data of electric power industry history and carbon emission related includes:
carbon emission data, economic activity-related data, energy usage data, and policy information data.
Wherein the method further comprises:
generating a data set based on the historical data, and dividing the data set into a training set and a testing set according to a preset proportion;
The predictive model is trained based on the training set and the test set.
Wherein generating a dataset based on the historical data comprises:
cleaning the historical data to filter abnormal values and complement missing values, so as to obtain cleaning data;
and screening the cleaning data to obtain key variable data influencing carbon emission, and carrying out normalization processing on the key variable data influencing carbon emission to generate a data set.
Wherein training the predictive model based on the training set and the test set comprises:
Training XGBoost algorithm by using the training set to generate an initial prediction model, testing the performance of the initial prediction model by using the testing set, and adjusting model parameters of the prediction model according to a testing result to obtain the prediction model.
Training XGBoost the algorithm by using the training set to generate an initial prediction model, including:
And selecting an average value of the carbon emission in the power industry as an initial predicted value of the carbon emission to define an initial model, performing iterative training on the initial model based on XGBoost algorithm to generate an iterative model, establishing an objective function based on the iterative model, and generating the initial predicted model based on the objective function.
In the iterative training process, a decision tree is added in each iteration, and the latest predicted value is updated.
The output of the iterative model is as follows:
Wherein, For the predicted value of carbon emission, t is the number of weak classifiers, and f k(xi) is the predicted value of the kth weak classifier on the sample xi.
Wherein the objective function is as follows:
Wherein, Is an objective function, is a function to be optimized in a prediction model and aims at an objective function value of a bj node, y i is a model predicted value, y i t-1 is an actual value or a value at the previous moment, L is a loss function, which measures the difference between the model predicted value y i and the actual value or the value at the previous moment y i t-1, g i is a first derivative of the loss function, f t (x) is a predicted value of the model in the current t-th iteration, and h i is a second derivative of the loss function; Omega (f t) is a regularization term, in the decision tree model, omega (f t) enables the model to be simpler and more effective by adjusting the number and the weight of leaf nodes of the tree, and constant is a constant term.
Converting the objective function, and establishing an initial prediction model based on the converted objective function;
wherein the transformed objective function is as follows:
Wherein, T is the total dimension or the number of leaf nodes, represents the number of terms summed in the round of iteration, and is the upper summation bound in the formula; the square of the sum of the first derivatives of the loss function, H j the second derivative of the loss function, and lambda the regularization term can be used to avoid numerical instability when H j is too small.
Solving the prediction model, namely solving the prediction model by solving the Gain change of the information before and after node splitting so as to predict the carbon emission of the power industry in a future period;
The calculation formula of the information Gain change is as follows:
Wherein Gain is the information Gain; The method is characterized in that the method is an objective function which is to be optimized in a prediction model and aims at an objective function value of a bj node, G L is a gradient sum of a left sub-node and represents an accumulated value of gradients on a left sub-tree after a current node is split, G R is a gradient sum of a right sub-node and represents an accumulated value of gradients on a right sub-tree after the current node is split, H L is a Hessian sum of the left sub-node and represents a sum of second derivatives and is used for adjusting update step length, lambda is a regularization term and is used for adjusting regularization parameters in a tree growth process, the model is kept simple by punishing complexity of split nodes, overfitting is avoided, and H R is a Hessian sum of the right sub-node and represents a sum of the second derivatives and is used for curvature control in the update process.
The method for establishing the prediction model specifically comprises the following steps:
According to the method, the influence of economic activities, energy use and policy change on the power industry is comprehensively considered, the XGBoost algorithm is adopted to analyze historical data of the power industry in a long time scale, key influence factors of carbon emission in the power industry in a multidimensional manner are clarified, modeling analysis is carried out on influence weights of variables, so that a power industry carbon emission time sequence prediction model is constructed, and prediction in higher space precision and longer time scale is carried out on future carbon emission, and the method mainly comprises the following steps:
step one, defining an initial model and an iterative model.
And selecting an average value of the carbon emission related data of the existing power industry as an initial predicted value of the carbon emission. Simultaneously performing iterative training, and newly adding a decision tree into the model and updating the latest predicted value in each iterationThe specific iterative process is shown in formula (1):
Wherein: Is a predicted value of carbon emissions;
x i is the eigenvector of the carbon emission sample;
initial predicted value for carbon emissions;
f t(xi) is the predicted value of the t weak classifier on the sample xi;
finally, the predicted value of the model for a specific sample is obtained by accumulating the predicted results of all decision trees, and the output of the model results is as follows:
and step two, setting an objective function.
In the power industry carbon emission prediction process based on XGBoost algorithm, the objective function model error and the structural error are formed. The model error, also called loss function, is determined by the difference between the real value of the power industry data and the predicted value of the model, and the structural error acts to control the complexity of the model and prevent over-fitting. The objective function can be expressed by the formula (2):
substituting equation (1) into equation (2), the objective function may be further expressed as:
Assuming that the model error (loss function) is a square loss function and the final fitting iteration number is t, the prediction model comprises t decision trees, and the error of the model is composed of three parts of the sum of errors of the selected n data samples in the t decision trees, the structural error of the t decision trees and the structural error of the previous t-1 decision trees. Considering that the structure of the first t-1 decision tree is known, its structural error can be regarded as a constant, and thus, the objective function Can be further expressed as:
wherein, the variable f t can be expressed as:
with equation (4), the objective function Can be expressed as:
Wherein, AndRespectively areFirst and second derivatives of (a). When the model error (loss function) is assumed to be a square loss function,
Further, considering that L (y) represents the sum of errors between the data samples and the t-1 st decision tree, it can be considered as a constant term. Thus, the objective function may be expressed as a function of f t (x), while f t (x) is a function of the decision tree node output w, and further, the objective function may be converted entirely into a function of w, expressed as:
Order the The objective function may be converted to:
where j represents the j-th node and i represents the i-th sample. Considering that the objective function is a unitary quadratic function with respect to w (i.e. the leaf node branches), in order to optimize the objective function, the optimal solution of w needs to be further solved, and the derivative is obtained:
Substituting the w' solution set into the set, the objective function may be further transformed into:
the method can be used for solving the objective function of the binary decision tree under the conditions of obtaining the first derivative and the second derivative of the loss function and sample data distribution.
Further, the characteristic node splitting situation can be analyzed by solving the Gain change of the information before and after node splitting based on the objective function obtained by solving. If the current information gain is >0, the node is considered to split, and if the gain is not greater than 0, the node is not split. The information gain can be expressed as follows
It should be specifically noted that, in the process of predicting carbon emission in the power industry based on XGBoost algorithm, node splitting modes can be classified into node splitting based on greedy strategy and node splitting based on approximate strategy. The node splitting based on the greedy strategy is characterized in that the gain before and after splitting is calculated by searching each feature point and each possible value thereof in detail, and the feature value with the largest gain is selected as the splitting point, so that the method is accurate and has large calculated amount. In order to improve the calculation efficiency, in the solving process, a node splitting mode based on an approximate strategy is generally adopted, a characteristic barrel dividing thought is adopted, the values of the characteristics are distributed into different characteristic barrels according to the number of the digits, and only the boundary value of the barrel is considered as a candidate of a splitting node.
By combining the initial model, the iterative model and the objective function, the prior data related to the carbon emission in the power industry is input, and after multiple fitting and solving, the prediction of the future carbon emission in the power industry with higher spatial precision and longer time scale can be realized based on XGBoost algorithm.
The following description is given in specific cases, including the following steps:
fig. 1 is a flowchart showing an implementation of a method for predicting carbon emission in the electric power industry based on XGBoost algorithm and multidimensional dataset analysis according to the present invention, specifically including the following steps:
S101, carbon emission and multidimensional data collection. Key factors affecting carbon emissions include historical carbon emissions, economic indicators (e.g., GDP, industrial production values), energy usage (e.g., consumption of coal, natural gas, renewable energy), demographic data, and policies and regulations. The example uses reliable data of government statistical bureau, academic research, industry report and other sources, and the example selects the thermal power generation, the power generation increase percentage, the power generation accumulation value, the power generation current period value, the asset investment for the power industry, the financial budget and other month data with multiple dimensions from 1990 month to 12 months on the national statistical bureau network as the original data.
S102, preprocessing carbon emission data. Preprocessing generally comprises the following key links of 1) identifying the missing items in a data set, selecting to delete records containing missing values, or filling by using interpolation, average filling or other suitable methods, 2) analyzing historical data of the power industry in a long time scale, selecting a multi-dimensional key influence factor of carbon emission of the power industry as a characteristic variable, and 3) normalizing or standardizing the input carbon emission data of the power industry.
S103, dividing carbon emission data. The model requires the entire data set to be divided into a training set and a testing set to ensure that the model can be effectively evaluated on unseen data, avoiding overfitting. Common dividing ratios are 70% -90% for training and 10% -30% for testing, which can be adjusted according to the size and characteristics of a particular dataset. In this example, 10% of the fitting result is selected as the proportion of the test set. And the data set is divided by adopting tool functions such as train_test_split and the like so as to ensure that the training set and the test set have similarity in feature distribution. In the embodiment, a XGBoost algorithm is used for training the model, the model parameters are adjusted through repeated iteration, the loss function is minimized, and in the training process, cross verification is used for evaluating the generalization capability of the model, so that the best super-parameter setting is selected.
And S104, evaluating reliability of the carbon emission data test set. After training, the final performance of the model is evaluated by using a test set, and indexes such as mean square error (RMSE), R 2 and the like are used for ensuring the prediction capability of the model. The evaluation indexes selected in this example are Mean Square Error (MSE), root Mean Square Error (RMSE), mean Absolute Error (MAE) and Mean Absolute Percent Error (MAPE).
S105, predicting carbon emission in the future power industry. After model training is completed, a trained XGBoost model is used for predicting future data, new data is input, and corresponding prediction results are obtained.
S106, analyzing a carbon emission prediction result in the future power industry. And identifying the change trend of the carbon emission by carrying out time sequence analysis on the predicted result. The result obtained in this example is that the carbon emission no longer shows a significant trend of rising, the carbon emission is significantly alleviated, and it is also demonstrated that the carbon-to-peak carbon neutralization objective is being achieved gradually.
FIG. 2 is a schematic diagram of a decision tree of XGBoost algorithm used in the method for predicting carbon emissions in the electric power industry according to the present invention. Unlike traditional machine learning algorithms, XGBoost algorithm is mainly distinguished by the number of sub-decision tree models introduced during the calculation process, which is also the main reason that it can achieve higher spatial resolution and longer-scale carbon emission prediction. When XGBoost is used for data prediction, the accuracy of a prediction result is gradually improved through continuous iteration based on a gradient lifting algorithm. Meanwhile, in the training process, the data decision tree node obtained by each calculation is split into two sub-nodes, and each sub-node corresponds to a new decision tree model. Thus, each node in XGBoost is part of a decision tree model, rather than a separate decision tree. Therefore, the mode can effectively utilize the existing decision tree model, quicken the training speed of the model and improve the accuracy of the model.
Fig. 3 shows an example display of power industry input data required for developing power industry carbon emission predictions using the present invention. The design data of the model is derived from the national statistical office network. According to the requirements of model building, the input data not only comprises carbon emission, but also needs to cover other relevant dimensions which can influence carbon emission. Therefore, multi-dimensional month data from 1990 month 1 to 2020 month 12 on the national institutes of statistics are selected as raw data, including thermal power generation (HLFD), generation increase percentage (ZZBFL), generation accumulation value (FDLJZ), generation current value (FDDQZ), electric power industry asset investment (ZCTZ), financial budget (CZYS) and the like. These multidimensional data provide comprehensive inputs to the model, which helps to improve accurate predictions and analyses of carbon emissions.
According to the invention, the influence of economic activities, energy use and policy change on the power industry is comprehensively considered, the XGBoost algorithm is adopted to analyze the historical data of the power industry in a long time scale, the prediction of the future carbon emission in the power industry in a higher spatial precision and longer time scale can be realized, and more refined analysis and decision support are provided for the low-carbon transformation in the power industry.
Example 2:
The invention also provides a power industry carbon emission prediction system 200 based on XGBoost algorithm, as shown in fig. 4, comprising:
a data acquisition unit 201 for acquiring data related to carbon emissions currently in the power industry;
The solving unit 202 inputs the data related to the current carbon emission of the electric power industry into a prediction model, and solves the prediction model to predict the carbon emission of the electric power industry in a future period;
the training samples of the predictive model are training sets and test sets generated based on historical data of the power industry history related to carbon emissions.
Wherein, the historical data of electric power industry history and carbon emission related includes:
carbon emission data, economic activity-related data, energy usage data, and policy information data.
Wherein, the solving unit 202 is further configured to:
generating a data set based on the historical data, and dividing the data set into a training set and a testing set according to a preset proportion;
The predictive model is trained based on the training set and the test set.
Wherein generating a dataset based on the historical data comprises:
cleaning the historical data to filter abnormal values and complement missing values, so as to obtain cleaning data;
and screening the cleaning data to obtain key variable data influencing carbon emission, and carrying out normalization processing on the key variable data influencing carbon emission to generate a data set.
Wherein training the predictive model based on the training set and the test set comprises:
Training XGBoost algorithm by using the training set to generate an initial prediction model, testing the performance of the initial prediction model by using the testing set, and adjusting model parameters of the prediction model according to a testing result to obtain the prediction model.
Training XGBoost the algorithm by using the training set to generate an initial prediction model, including:
And selecting an average value of the carbon emission in the power industry as an initial predicted value of the carbon emission to define an initial model, performing iterative training on the initial model based on XGBoost algorithm to generate an iterative model, establishing an objective function based on the iterative model, and generating the initial predicted model based on the objective function.
In the iterative training process, a decision tree is added in each iteration, and the latest predicted value is updated.
The output of the iterative model is as follows:
Wherein, For the predicted value of carbon emission, t is the number of weak classifiers, and f k(xi) is the predicted value of the kth weak classifier on the sample xi.
Wherein the objective function is as follows:
Wherein, Is an objective function, is a function to be optimized in a prediction model and aims at an objective function value of a bj node, y i is a model predicted value, y i t-1 is an actual value or a value at the previous moment, L is a loss function, which measures the difference between the model predicted value y i and the actual value or the value at the previous moment y i t-1, g i is a first derivative of the loss function, f t (x) is a predicted value of the model in the current t-th iteration, and h i is a second derivative of the loss function; Omega (f t) is a regularization term, in the decision tree model, omega (f t) enables the model to be simpler and more effective by adjusting the number and the weight of leaf nodes of the tree, and constant is a constant term.
Converting the objective function, and establishing an initial prediction model based on the converted objective function;
wherein the transformed objective function is as follows:
Wherein, T is the total dimension or the number of leaf nodes, represents the number of terms summed in the round of iteration, and is the upper summation bound in the formula; the square of the sum of the first derivatives of the loss function, H j the second derivative of the loss function, and lambda the regularization term can be used to avoid numerical instability when H j is too small.
The solving unit 202 solves the prediction model, including solving the prediction model by solving the information Gain change before and after node splitting to predict the carbon emission of the electric power industry in the future period;
The calculation formula of the information Gain change is as follows:
Wherein Gain is the information Gain; The method is characterized in that the method is an objective function which is to be optimized in a prediction model and aims at an objective function value of a bj node, G L is a gradient sum of a left sub-node and represents an accumulated value of gradients on a left sub-tree after a current node is split, G R is a gradient sum of a right sub-node and represents an accumulated value of gradients on a right sub-tree after the current node is split, H L is a Hessian sum of the left sub-node and represents a sum of second derivatives and is used for adjusting update step length, lambda is a regularization term and is used for adjusting regularization parameters in a tree growth process, the model is kept simple by punishing complexity of split nodes, overfitting is avoided, and H R is a Hessian sum of the right sub-node and represents a sum of the second derivatives and is used for curvature control in the update process.
According to the invention, XGBoost algorithm is adopted to construct a model for prediction, and the predicted carbon emission is more accurate.
Example 3:
Based on the same inventive concept, the invention also provides a computer device comprising a processor and a memory for storing a computer program comprising program instructions, the processor for executing the program instructions stored by the computer storage medium. The processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processor, digital signal processor (DIGITAL SIGNAL Processor, DSP), application specific integrated circuit (Application SpecificIntegrated Circuit, ASIC), off-the-shelf Programmable gate array (Field-Programmable GATEARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc., which are the computational core and control core of the terminal adapted to implement one or more instructions, in particular to load and execute one or more instructions within a computer storage medium to implement the corresponding method flow or corresponding functions to implement the steps of the method in the embodiments described above.
Example 4:
Based on the same inventive concept, the present invention also provides a storage medium, in particular, a computer readable storage medium (Memory), which is a Memory device in a computer device, for storing programs and data. It is understood that the computer readable storage medium herein may include both built-in storage media in a computer device and extended storage media supported by the computer device. The computer-readable storage medium provides a storage space storing an operating system of the terminal. Also stored in the memory space are one or more instructions, which may be one or more computer programs (including program code), adapted to be loaded and executed by the processor. The computer readable storage medium herein may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. One or more instructions stored in a computer-readable storage medium may be loaded and executed by a processor to implement the steps of the methods in the above-described embodiments.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The scheme in the embodiment of the invention can be realized by adopting various computer languages, such as object-oriented programming language Java, an transliteration script language JavaScript and the like.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (24)

1.一种基于XGBoost算法的电力行业碳排放预测方法,其特征在于,包括:1. A method for predicting carbon emissions in the power industry based on the XGBoost algorithm, comprising: 获取电力行业当前与碳排放相关的数据;Obtain current data related to carbon emissions in the power industry; 将所述电力行业当前与碳排放相关的数据输入至预测模型,并对所述预测模型进行求解,以预测出所述电力行业在未来时段的碳排放;Inputting the current data related to carbon emissions of the power industry into a prediction model, and solving the prediction model to predict the carbon emissions of the power industry in future periods; 所述预测模型的训练样本是基于电力行业历史与碳排放相关的历史数据生成的训练集和测试集。The training samples of the prediction model are training sets and test sets generated based on historical data related to the history of carbon emissions in the power industry. 2.根据权利要求1所述的方法,其特征在于,所述电力行业历史与碳排放相关的历史数据,包括:2. The method according to claim 1, characterized in that the historical data related to the history of the power industry and carbon emissions includes: 碳排放数据、经济活动相关数据、能源使用数据及政策信息数据。Carbon emission data, economic activity related data, energy usage data and policy information data. 3.根据权利要求1所述的方法,其特征在于,所述方法还包括:3. The method according to claim 1, characterized in that the method further comprises: 基于所述历史数据,生成数据集,将所述数据集按照预设比例划分为训练集和测试集;Based on the historical data, a data set is generated, and the data set is divided into a training set and a test set according to a preset ratio; 基于所述训练集和测试集训练所述预测模型。The prediction model is trained based on the training set and the test set. 4.根据权利要求3所述的方法,其特征在于,所述基于所述历史数据,生成数据集,包括:4. The method according to claim 3, characterized in that generating a data set based on the historical data comprises: 对所述历史数据进行清洗,以滤除异常值和补全缺失值,得到清洗数据;Cleaning the historical data to filter out abnormal values and fill in missing values to obtain cleaned data; 对所述清洗数据进行筛选,以得到影响碳排放的关键变量数据,对所述影响碳排放的关键变量数据进行归一化处理,以生成数据集。The cleaned data are screened to obtain key variable data affecting carbon emissions, and the key variable data affecting carbon emissions are normalized to generate a data set. 5.根据权利要求3所述的方法,其特征在于,所述基于所述训练集和测试集训练所述预测模型,包括:5. The method according to claim 3, characterized in that training the prediction model based on the training set and the test set comprises: 以所述训练集对XGBoost算法进行训练,生成初始预测模型,并使用所述测试集测试所述初始预测模型的性能,根据测试结果对所述预测模型的模型参数进行调整,得到所述预测模型。The XGBoost algorithm is trained with the training set to generate an initial prediction model, and the performance of the initial prediction model is tested with the test set. The model parameters of the prediction model are adjusted according to the test results to obtain the prediction model. 6.根据权利要求5所述的方法,其特征在于,所述以所述训练集对XGBoost算法进行训练,生成初始预测模型,包括:6. The method according to claim 5, characterized in that the step of training the XGBoost algorithm with the training set to generate an initial prediction model comprises: 选取电力行业碳排放量的平均值,作为碳排放量初始预测值,以定义得到初始模型,基于XGBoost算法对所述初始模型进行迭代训练,生成迭代模型,基于所述迭代模型建立目标函数,基于所述目标函数,生成初始预测模型。The average value of carbon emissions in the power industry is selected as the initial prediction value of carbon emissions to define an initial model, the initial model is iteratively trained based on the XGBoost algorithm to generate an iterative model, an objective function is established based on the iterative model, and an initial prediction model is generated based on the objective function. 7.根据权利要求6所述的方法,其特征在于,迭代训练过程中,每一次迭代均添加一棵决策树,并更新得到最新的预测值。7. The method according to claim 6 is characterized in that, during the iterative training process, a decision tree is added in each iteration and updated to obtain the latest prediction value. 8.根据权利要求6所述的方法,其特征在于,所述迭代模型的输出如下:8. The method according to claim 6, characterized in that the output of the iterative model is as follows: 其中,为碳排放预测值,t为弱分类器个数,fk(xi)为第k个弱分类器对样本xi的预测值。in, is the predicted value of carbon emission, t is the number of weak classifiers, and f k ( xi ) is the predicted value of sample xi by the kth weak classifier. 9.根据权利要求6所述的方法,其特征在于,所述目标函数,如下:9. The method according to claim 6, characterized in that the objective function is as follows: 其中,为目标函数,是预测模型中要优化的函数,针对第bj个节点的目标函数值;yi为模型预测值;yi t-1为实际值或前一时刻的值;L为损失函数,它衡量了模型预测值yi与实际值或前一时刻的值yi t-1之间的差异;gi为损失函数的一阶导数;ft(x)为模型在当前第t次迭代中的预测值;hi为损失函数的二阶导数;为二阶梯度修正项;Ω(ft)为正则化项,在决策树模型中,Ω(ft)通过调节树的叶节点数量和权重,使模型更加简单而有效;constant为常数项。in, is the objective function, which is the function to be optimized in the prediction model, and is the objective function value for the bjth node; yi is the model prediction value; yi t-1 is the actual value or the value at the previous moment; L is the loss function, which measures the difference between the model prediction value yi and the actual value or the value at the previous moment yi t -1 ; gi is the first-order derivative of the loss function; f t (x) is the prediction value of the model in the current tth iteration; hi is the second-order derivative of the loss function; is the second-order gradient correction term; Ω( ft ) is the regularization term. In the decision tree model, Ω( ft ) makes the model simpler and more effective by adjusting the number and weight of leaf nodes of the tree; constant is a constant term. 10.根据权利要求6所述的方法,其特征在于,所述方法还包括:10. The method according to claim 6, characterized in that the method further comprises: 对目标函数进行转换,基于转换后的目标函数,建立初始预测模型;Transform the objective function, and establish an initial prediction model based on the transformed objective function; 其中,转换后的目标函数如下:Among them, the converted objective function is as follows: 其中,为目标函数,是预测模型中要优化的函数,针对第bj个节点的目标函数值;T为总维度或叶节点的数量,表示该轮迭代中求和的项数,在公式中为求和上界;为损失函数一阶导数和的平方;Hj为损失函数的二阶导数;λ为正则化项,可以使得在Hj过小时避免数值不稳定。in, is the objective function, which is the function to be optimized in the prediction model, and is the objective function value for the bjth node; T is the total dimension or the number of leaf nodes, which indicates the number of items summed in this round of iteration, and is the upper bound of the sum in the formula; is the square of the sum of the first-order derivatives of the loss function; Hj is the second-order derivative of the loss function; λ is the regularization term, which can avoid numerical instability when Hj is too small. 11.根据权利要求1所述的方法,其特征在于,对所述预测模型进行求解,包括:通过求解节点分裂前后信息增益Gain变化,以求解预测模型,以预测出所述电力行业在未来时段的碳排放;11. The method according to claim 1, characterized in that solving the prediction model comprises: solving the prediction model by solving the change of information gain Gain before and after the node splitting, so as to predict the carbon emissions of the power industry in the future period; 其中,信息增益Gain变化的计算公式如下:Among them, the calculation formula for the change of information gain Gain is as follows: 其中,Gain为信息增益;为目标函数,是预测模型中要优化的函数,针对第bj个节点的目标函数值;GL为左子节点的梯度和,表示当前节点分裂后,左子树上的梯度的累加值;GR为右子节点的梯度和,表示当前节点分裂后,右子树上的梯度的累加值;HL为左子节点的Hessian和,表示二阶导数的和,用于调整更新步长,λ为正则化项,用来调整树的生长过程中的正则化参数,通过惩罚分裂节点的复杂度,使模型保持简洁,避免过拟合;HR为右子节点的Hessian和,表示二阶导数的和,用于更新过程中的曲率控制。Among them, Gain is information gain; is the objective function, which is the function to be optimized in the prediction model, and is the objective function value for the bjth node; GL is the gradient sum of the left child node, which represents the accumulated value of the gradient on the left subtree after the current node is split; GR is the gradient sum of the right child node, which represents the accumulated value of the gradient on the right subtree after the current node is split; HL is the Hessian sum of the left child node, which represents the sum of the second-order derivatives and is used to adjust the update step size; λ is the regularization term, which is used to adjust the regularization parameter in the growth process of the tree, and by penalizing the complexity of the split node, the model is kept simple and overfitting is avoided; HR is the Hessian sum of the right child node, which represents the sum of the second-order derivatives and is used for curvature control in the update process. 12.一种基于XGBoost算法的电力行业碳排放预测系统,其特征在于,包括:12. A carbon emission prediction system for the power industry based on the XGBoost algorithm, characterized by comprising: 数据获取单元,用于获取电力行业当前与碳排放相关的数据;A data acquisition unit, used to acquire current data related to carbon emissions in the power industry; 求解单元,将所述电力行业当前与碳排放相关的数据输入至预测模型,并对所述预测模型进行求解,以预测出所述电力行业在未来时段的碳排放;A solution unit, inputting the current data related to carbon emissions of the power industry into a prediction model, and solving the prediction model to predict the carbon emissions of the power industry in a future period; 所述预测模型的训练样本是基于电力行业历史与碳排放相关的历史数据生成的训练集和测试集。The training samples of the prediction model are training sets and test sets generated based on historical data related to the history of carbon emissions in the power industry. 13.根据权利要求11所述的系统,其特征在于,所述电力行业历史与碳排放相关的历史数据,包括:13. The system according to claim 11, wherein the historical data related to the history of the power industry and carbon emissions include: 碳排放数据、经济活动相关数据、能源使用数据及政策信息数据。Carbon emission data, economic activity related data, energy usage data and policy information data. 14.根据权利要求12所述的系统,其特征在于,所述求解单元,还用于:14. The system according to claim 12, characterized in that the solving unit is further used for: 基于所述历史数据,生成数据集,将所述数据集按照预设比例划分为训练集和测试集;Based on the historical data, a data set is generated, and the data set is divided into a training set and a test set according to a preset ratio; 基于所述训练集和测试集训练所述预测模型。The prediction model is trained based on the training set and the test set. 15.根据权利要求14所述的系统,其特征在于,所述基于所述历史数据,生成数据集,包括:15. The system according to claim 14, wherein generating a data set based on the historical data comprises: 对所述历史数据进行清洗,以滤除异常值和补全缺失值,得到清洗数据;Cleaning the historical data to filter out abnormal values and fill in missing values to obtain cleaned data; 对所述清洗数据进行筛选,以得到影响碳排放的关键变量数据,对所述影响碳排放的关键变量数据进行归一化处理,以生成数据集。The cleaned data are screened to obtain key variable data affecting carbon emissions, and the key variable data affecting carbon emissions are normalized to generate a data set. 16.根据权利要求14所述的系统,其特征在于,所述基于所述训练集和测试集训练所述预测模型,包括:16. The system according to claim 14, wherein training the prediction model based on the training set and the test set comprises: 以所述训练集对XGBoost算法进行训练,生成初始预测模型,并使用所述测试集测试所述初始预测模型的性能,根据测试结果对所述预测模型的模型参数进行调整,得到所述预测模型。The XGBoost algorithm is trained with the training set to generate an initial prediction model, and the performance of the initial prediction model is tested with the test set. The model parameters of the prediction model are adjusted according to the test results to obtain the prediction model. 17.根据权利要求16所述的系统,其特征在于,所述以所述训练集对XGBoost算法进行训练,生成初始预测模型,包括:17. The system according to claim 16, wherein the training of the XGBoost algorithm with the training set to generate an initial prediction model comprises: 选取电力行业碳排放量的平均值,作为碳排放量初始预测值,以定义得到初始模型,基于XGBoost算法对所述初始模型进行迭代训练,生成迭代模型,基于所述迭代模型建立目标函数,基于所述目标函数,生成初始预测模型。The average value of carbon emissions in the power industry is selected as the initial prediction value of carbon emissions to define an initial model, and the initial model is iteratively trained based on the XGB oostar algorithm to generate an iterative model. An objective function is established based on the iterative model, and an initial prediction model is generated based on the objective function. 18.根据权利要求17所述的系统,其特征在于,迭代训练过程中,每一次迭代均添加一棵决策树,并更新得到最新的预测值。18. The system according to claim 17 is characterized in that, during the iterative training process, a decision tree is added in each iteration and updated to obtain the latest prediction value. 19.根据权利要求17所述的系统,其特征在于,所述迭代模型的输出如下:19. The system according to claim 17, wherein the output of the iterative model is as follows: 其中,为碳排放预测值,t为弱分类器个数,fk(xi)为第k个弱分类器对样本xi的预测值。in, is the predicted value of carbon emission, t is the number of weak classifiers, and f k ( xi ) is the predicted value of sample xi by the kth weak classifier. 20.根据权利要求17所述的系统,其特征在于,所述目标函数,如下:20. The system according to claim 17, characterized in that the objective function is as follows: 其中,为目标函数,是预测模型中要优化的函数,针对第bj个节点的目标函数值;yi为模型预测值;yi t-1为实际值或前一时刻的值;L为损失函数,它衡量了模型预测值yi与实际值或前一时刻的值yi t-1之间的差异;gi为损失函数的一阶导数;ft(x)为模型在当前第t次迭代中的预测值;hi为损失函数的二阶导数;为二阶梯度修正项;Ω(ft)为正则化项,在决策树模型中,Ω(ft)通过调节树的叶节点数量和权重,使模型更加简单而有效;constant为常数项。in, is the objective function, which is the function to be optimized in the prediction model, and is the objective function value for the bjth node; yi is the model prediction value; yi t-1 is the actual value or the value at the previous moment; L is the loss function, which measures the difference between the model prediction value yi and the actual value or the value at the previous moment yi t -1 ; gi is the first-order derivative of the loss function; f t (x) is the prediction value of the model in the current tth iteration; hi is the second-order derivative of the loss function; is the second-order gradient correction term; Ω( ft ) is the regularization term. In the decision tree model, Ω( ft ) makes the model simpler and more effective by adjusting the number and weight of leaf nodes of the tree; constant is a constant term. 21.根据权利要求17所述的系统,其特征在于,对目标函数进行转换,基于转换后的目标函数,建立初始预测模型:21. The system according to claim 17, characterized in that the objective function is converted, and an initial prediction model is established based on the converted objective function: 其中,转换后的目标函数如下:Among them, the converted objective function is as follows: 其中,为目标函数,是预测模型中要优化的函数,针对第bj个节点的目标函数值;T为总维度或叶节点的数量,表示该轮迭代中求和的项数,在公式中为求和上界;为损失函数一阶导数和的平方;Hj为损失函数的二阶导数;λ为正则化项,可以使得在Hj过小时避免数值不稳定。in, is the objective function, which is the function to be optimized in the prediction model, and is the objective function value for the bjth node; T is the total dimension or the number of leaf nodes, which indicates the number of items summed in this round of iteration, and is the upper bound of the sum in the formula; is the square of the sum of the first-order derivatives of the loss function; Hj is the second-order derivative of the loss function; λ is the regularization term, which can avoid numerical instability when Hj is too small. 22.根据权利要求12所述的系统,其特征在于,所述求解单元,对所述预测模型进行求解,包括:通过求解节点分裂前后信息增益Gain变化,以求解预测模型,以预测出所述电力行业在未来时段的碳排放;22. The system according to claim 12, characterized in that the solving unit solves the prediction model, comprising: solving the prediction model by solving the change of information gain Gain before and after the node splitting, so as to predict the carbon emissions of the power industry in the future period; 其中,信息增益Gain变化的计算公式如下:Among them, the calculation formula for the change of information gain Gain is as follows: 其中,Gain为信息增益;为目标函数,是预测模型中要优化的函数,针对第bj个节点的目标函数值;GL为左子节点的梯度和,表示当前节点分裂后,左子树上的梯度的累加值;GR为右子节点的梯度和,表示当前节点分裂后,右子树上的梯度的累加值;HL为左子节点的Hessian和,表示二阶导数的和,用于调整更新步长,λ为正则化项,用来调整树的生长过程中的正则化参数,通过惩罚分裂节点的复杂度,使模型保持简洁,避免过拟合;HR为右子节点的Hessian和,表示二阶导数的和,用于更新过程中的曲率控制。Among them, Gain is information gain; is the objective function, which is the function to be optimized in the prediction model, and is the objective function value for the bjth node; GL is the gradient sum of the left child node, which represents the accumulated value of the gradient on the left subtree after the current node is split; GR is the gradient sum of the right child node, which represents the accumulated value of the gradient on the right subtree after the current node is split; HL is the Hessian sum of the left child node, which represents the sum of the second-order derivatives and is used to adjust the update step size; λ is the regularization term, which is used to adjust the regularization parameter in the growth process of the tree, and by penalizing the complexity of the split node, the model is kept simple and overfitting is avoided; HR is the Hessian sum of the right child node, which represents the sum of the second-order derivatives and is used for curvature control in the update process. 23.一种计算机设备,其特征在于,包括:23. A computer device, comprising: 一个或多个处理器;one or more processors; 处理器,用于执行一个或多个程序;a processor for executing one or more programs; 当所述一个或多个程序被所述一个或多个处理器执行时,实现如权利要求1-11中任一所述的方法。When the one or more programs are executed by the one or more processors, the method according to any one of claims 1 to 11 is implemented. 24.一种计算机可读存储介质,其特征在于,其上存有计算机程序,所述计算机程序被执行时,实现如权利要求1-11中任一所述的方法。24. A computer-readable storage medium, characterized in that a computer program is stored thereon, and when the computer program is executed, the method according to any one of claims 1 to 11 is implemented.
CN202411492988.8A 2024-10-24 2024-10-24 A method and system for predicting carbon emissions in the power industry based on XGBoost algorithm Pending CN119539145A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411492988.8A CN119539145A (en) 2024-10-24 2024-10-24 A method and system for predicting carbon emissions in the power industry based on XGBoost algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411492988.8A CN119539145A (en) 2024-10-24 2024-10-24 A method and system for predicting carbon emissions in the power industry based on XGBoost algorithm

Publications (1)

Publication Number Publication Date
CN119539145A true CN119539145A (en) 2025-02-28

Family

ID=94702457

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411492988.8A Pending CN119539145A (en) 2024-10-24 2024-10-24 A method and system for predicting carbon emissions in the power industry based on XGBoost algorithm

Country Status (1)

Country Link
CN (1) CN119539145A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120087979A (en) * 2025-05-06 2025-06-03 山东科技大学 A method, device and medium for predicting carbon emissions based on capturing relationships between variables

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120087979A (en) * 2025-05-06 2025-06-03 山东科技大学 A method, device and medium for predicting carbon emissions based on capturing relationships between variables

Similar Documents

Publication Publication Date Title
Zhang et al. Load probability density forecasting by transforming and combining quantile forecasts
CN107220851A (en) Electricity sales amount Forecasting Methodology and device based on X13 seasonal adjustments and Cox regression
US11366806B2 (en) Automated feature generation for machine learning application
CN108694470A (en) A kind of data predication method and device based on artificial intelligence
CN118229119B (en) Short-term load forecasting method, system and storage medium integrating time series decomposition and machine learning model
CN114169434A (en) Load prediction method
CN119539145A (en) A method and system for predicting carbon emissions in the power industry based on XGBoost algorithm
CN119298177B (en) Photovoltaic energy storage scheduling method and system considering source-load uncertainty
Fan et al. Multi-objective LSTM ensemble model for household short-term load forecasting
CN114970345A (en) Method, device, device and readable storage medium for constructing short-term load forecasting model
CN115034473A (en) A kind of electricity price prediction method, system and device
Liu et al. Research and application of short-term load forecasting based on CEEMDAN-LSTM modeling
CN118609709A (en) Parameter inversion and optimization method for reservoir numerical simulation based on intelligent agent
CN120106307A (en) Intelligent prediction system of carbon emissions based on neural network
CN119250848A (en) A real estate valuation method based on Bayesian optimization and machine learning
CN113111588A (en) NO of gas turbineXEmission concentration prediction method and device
CN118113279A (en) Power load prediction low-code slice construction method and system based on deep learning model
Wang et al. LightGBM-BES-BiLSTM carbon price prediction based on environmental impact factors
Tronci et al. Physics Informed Machine Learning Part I: Different Strategies to Incorporate Physics into Engineering Problems
Zhao et al. A hybrid framework for short-term load forecasting based on optimized InMetra Boost and BiLSTM
CN118378761B (en) Power grid data purification method, system, equipment and medium
CN120262409B (en) New energy power generation prediction method and system based on improved LSTM model
Huang et al. On Digital Economy Scales Prediction Technology based on QPSO and LSTM Model
CN120072139A (en) Grouting material strength prediction method and system based on causal inference and machine learning
Keisler et al. WindDragon: automated deep learning for regional wind power forecasting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载