CN119539145A - A method and system for predicting carbon emissions in the power industry based on XGBoost algorithm - Google Patents
A method and system for predicting carbon emissions in the power industry based on XGBoost algorithm Download PDFInfo
- Publication number
- CN119539145A CN119539145A CN202411492988.8A CN202411492988A CN119539145A CN 119539145 A CN119539145 A CN 119539145A CN 202411492988 A CN202411492988 A CN 202411492988A CN 119539145 A CN119539145 A CN 119539145A
- Authority
- CN
- China
- Prior art keywords
- model
- data
- prediction model
- value
- objective function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/10—Pre-processing; Data cleansing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Tourism & Hospitality (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Entrepreneurship & Innovation (AREA)
- Development Economics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Game Theory and Decision Science (AREA)
- Software Systems (AREA)
- Educational Administration (AREA)
- Primary Health Care (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Water Supply & Treatment (AREA)
- Public Health (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a power industry carbon emission prediction method and system based on XGBoost algorithm, and belongs to the technical field of carbon emission prediction. The method comprises the steps of obtaining data related to carbon emission in the power industry currently, inputting the data related to the carbon emission in the power industry currently into a prediction model, and solving the prediction model to predict the carbon emission of the power industry in a future period, wherein training samples of the prediction model are training sets and test sets generated based on historical data related to the carbon emission of the power industry history. According to the invention, XGBoost algorithm is adopted to construct a model for prediction, and the predicted carbon emission is more accurate.
Description
Technical Field
The invention relates to the technical field of carbon emission prediction, in particular to a power industry carbon emission prediction method and system based on XGBoost algorithm.
Background
With the development of human society, the problems of shortage of traditional fossil energy, serious environmental pollution and the like are more prominent, and the 'carbon peak and carbon neutralization' have become main future paths for relieving energy crisis. Among them, the power industry is one of the main sources of energy consumption and carbon emission, and is responsible for important emission reduction tasks. Therefore, developing a high-precision carbon emission prediction method for the high-energy-consumption industry has become an important basis for formulating an emission reduction policy, evaluating an emission reduction effect and optimizing an energy structure.
Current application state of carbon emission prediction model:
The accuracy of the carbon emission prediction is mainly determined by the richness of the predicted object data set and the accuracy of the adopted prediction model. For the prediction object, since the data provided by each region of each province is relatively abundant, most of the existing carbon emission prediction researches are carried out for the large-scale regions, the carbon emission prediction researches for specific industries are relatively less, the power industry uses large energy to use electricity, the different time-space dimensions have large difference, and the carbon emission data is difficult to obtain. Regarding the prediction model adopted by the carbon emission prediction, the prediction model mainly comprises STIRPAT model, LEAP model, BP neural network model and the like, and the future carbon emission trend of the whole area is predicted by comprehensively analyzing economic activities, energy use, technical progress and policy influence, and the research current situation is summarized as follows:
The regional carbon emission prediction method based on STIRPAT model comprises the following steps:
The regional carbon emission prediction method based on STIRPAT model takes analysis of environmental impact as driving factor, and researches the influence of Population (Population), abundance (Affluence) and Technology (Technology) on the environment through a statistical method, so as to realize the prediction of large regional carbon emission. However, as the model introduces more variables and nonlinear relations exist among large multiple variables, the model has higher complexity and more complex calculation. Meanwhile, the model defaults to assume that all factors are independent, complex interaction among all variables is ignored, and carbon emission prediction accuracy is relatively low.
The regional carbon emission prediction method based on LEAP model comprises the following steps:
The LEAP model-based regional carbon emission prediction method comprehensively considers various factors through a system dynamics and integration modeling method, evaluates the long-term influence of the energy policy and provides scientific basis for decision making. However, the predictive outcome of the LEAP model is highly dependent on the input data, and insufficient data or low quality may affect its accuracy. Furthermore, future technological advances and uncertainty in policy changes may lead to predictive bias, and models have difficulty responding quickly to market changes or incidents.
The regional carbon emission prediction method based on BP neural network comprises the following steps:
The regional carbon emission prediction method based on the BP neural network is a prediction model which takes a back propagation algorithm of the BP neural network as a principle, minimizes an output error by adjusting weight and bias and continuously adjusts parameters to approach an objective function. The model can be very time consuming for large data sets and the training process of complex networks. Furthermore, the gradient descent method may be trapped in local minima, affecting the overall performance of the model. In addition, the super-parameter tuning process is complex, and a plurality of super-parameters need to be adjusted, especially in a multi-layer network, gradients may gradually disappear when counter-propagating, and thus the training effect is affected.
The existing problems are as follows:
1) The carbon emission prediction method is insufficient in spatial accuracy. Most of the existing carbon emission prediction methods estimate carbon emission for a large geographical area, and local factors such as city development, industrial structure and traffic pattern differences cannot be carefully considered, so that the rough spatial resolution may cause deviation of prediction results.
2) The carbon emission prediction method is short in time scale. Existing carbon emission prediction methods often focus on short-term time scales, and usually only predict emission trends in the next few years, and such short-term viewing angles limit comprehensive understanding of long-term climate change influences, so that long-term influences of factors such as economic growth, technical progress and policy change on emission cannot be effectively captured.
3) There are few carbon emission prediction methods for the power industry. Most of the existing carbon emission prediction methods predict large-area carbon emission from a macroscopic view, and are not designed specifically for specific power industries, so that rapid development of renewable energy sources and changes of power markets cannot be fully considered.
Disclosure of Invention
Aiming at the problems, the invention provides a power industry carbon emission prediction method based on XGBoost algorithm, which comprises the following steps:
acquiring current data related to carbon emission in the power industry;
Inputting current data related to carbon emission of the electric power industry into a prediction model, and solving the prediction model to predict the carbon emission of the electric power industry in a future period;
the training samples of the predictive model are training sets and test sets generated based on historical data of the power industry history related to carbon emissions.
Optionally, the historical data of the power industry history related to carbon emissions includes:
carbon emission data, economic activity-related data, energy usage data, and policy information data.
Optionally, the method further comprises:
generating a data set based on the historical data, and dividing the data set into a training set and a testing set according to a preset proportion;
The predictive model is trained based on the training set and the test set.
Optionally, generating a data set based on the historical data includes:
cleaning the historical data to filter abnormal values and complement missing values, so as to obtain cleaning data;
and screening the cleaning data to obtain key variable data influencing carbon emission, and carrying out normalization processing on the key variable data influencing carbon emission to generate a data set.
Optionally, training the predictive model based on the training set and the testing set includes:
Training XGBoost algorithm by using the training set to generate an initial prediction model, testing the performance of the initial prediction model by using the testing set, and adjusting model parameters of the prediction model according to a testing result to obtain the prediction model.
Optionally, training XGBoost the algorithm with the training set to generate an initial prediction model, including:
And selecting an average value of the carbon emission in the power industry as an initial predicted value of the carbon emission to define an initial model, performing iterative training on the initial model based on XGBoost algorithm to generate an iterative model, establishing an objective function based on the iterative model, and generating the initial predicted model based on the objective function.
Optionally, in the iterative training process, a decision tree is added in each iteration, and the latest predicted value is updated.
Optionally, the output of the iterative model is as follows:
Wherein, For the predicted value of carbon emission, t is the number of weak classifiers, and f k(xi) is the predicted value of the kth weak classifier on the sample xi.
Optionally, the objective function is as follows:
Wherein, Is an objective function, is a function to be optimized in a prediction model and aims at an objective function value of a bj node, y i is a model predicted value, y i t-1 is an actual value or a value at the previous moment, L is a loss function, which measures the difference between the model predicted value y i and the actual value or the value at the previous moment y i t-1, g i is a first derivative of the loss function, f t (x) is a predicted value of the model in the current t-th iteration, and h i is a second derivative of the loss function; Omega (f t) is a regularization term, in the decision tree model, omega (f t) enables the model to be simpler and more effective by adjusting the number and the weight of leaf nodes of the tree, and constant is a constant term.
Optionally, converting the objective function, and establishing an initial prediction model based on the converted objective function;
wherein the transformed objective function is as follows:
Wherein, T is the total dimension or the number of leaf nodes, represents the number of terms summed in the round of iteration, and is the upper summation bound in the formula; the square of the sum of the first derivatives of the loss function, H j the second derivative of the loss function, and lambda the regularization term can be used to avoid numerical instability when H j is too small.
Optionally, solving the prediction model comprises solving the prediction model by solving the Gain change of the information before and after node splitting so as to predict the carbon emission of the electric power industry in a future period;
The calculation formula of the information Gain change is as follows:
Wherein Gain is the information Gain; The method is characterized in that the method is an objective function which is to be optimized in a prediction model and aims at an objective function value of a bj node, G L is a gradient sum of a left sub-node and represents an accumulated value of gradients on a left sub-tree after a current node is split, G R is a gradient sum of a right sub-node and represents an accumulated value of gradients on a right sub-tree after the current node is split, H L is a Hessian sum of the left sub-node and represents a sum of second derivatives and is used for adjusting update step length, lambda is a regularization term and is used for adjusting regularization parameters in a tree growth process, the model is kept simple by punishing complexity of split nodes, overfitting is avoided, and H R is a Hessian sum of the right sub-node and represents a sum of the second derivatives and is used for curvature control in the update process.
In still another aspect, the present invention further provides a power industry carbon emission prediction system based on XGBoost algorithm, including:
the data acquisition unit is used for acquiring current data related to carbon emission in the power industry;
The solving unit is used for inputting current data related to carbon emission of the electric power industry into a prediction model and solving the prediction model so as to predict the carbon emission of the electric power industry in a future period;
the training samples of the predictive model are training sets and test sets generated based on historical data of the power industry history related to carbon emissions.
Optionally, the historical data of the power industry history related to carbon emissions includes:
carbon emission data, economic activity-related data, energy usage data, and policy information data.
Optionally, the solving unit is further configured to:
generating a data set based on the historical data, and dividing the data set into a training set and a testing set according to a preset proportion;
The predictive model is trained based on the training set and the test set.
Optionally, generating a data set based on the historical data includes:
cleaning the historical data to filter abnormal values and complement missing values, so as to obtain cleaning data;
and screening the cleaning data to obtain key variable data influencing carbon emission, and carrying out normalization processing on the key variable data influencing carbon emission to generate a data set.
Optionally, training the predictive model based on the training set and the testing set includes:
Training XGBoost algorithm by using the training set to generate an initial prediction model, testing the performance of the initial prediction model by using the testing set, and adjusting model parameters of the prediction model according to a testing result to obtain the prediction model.
Optionally, training XGBoost the algorithm with the training set to generate an initial prediction model, including:
And selecting an average value of the carbon emission in the power industry as an initial predicted value of the carbon emission to define an initial model, performing iterative training on the initial model based on XGBoost algorithm to generate an iterative model, establishing an objective function based on the iterative model, and generating the initial predicted model based on the objective function.
Optionally, in the iterative training process, a decision tree is added in each iteration, and the latest predicted value is updated.
Optionally, the output of the iterative model is as follows:
Wherein, For the predicted value of carbon emission, t is the number of weak classifiers, and f k(xi) is the predicted value of the kth weak classifier on the sample xi.
Optionally, the objective function is as follows:
Wherein, Is an objective function, is a function to be optimized in a prediction model and aims at an objective function value of a bj node, y i is a model predicted value, y i t-1 is an actual value or a value at the previous moment, L is a loss function, which measures the difference between the model predicted value y i and the actual value or the value at the previous moment y i t-1, g i is a first derivative of the loss function, f t (x) is a predicted value of the model in the current t-th iteration, and h i is a second derivative of the loss function; Omega (f t) is a regularization term, in the decision tree model, omega (f t) enables the model to be simpler and more effective by adjusting the number and the weight of leaf nodes of the tree, and constant is a constant term.
Optionally, converting the objective function, and establishing an initial prediction model based on the converted objective function;
wherein the transformed objective function is as follows:
Wherein, T is the total dimension or the number of leaf nodes, represents the number of terms summed in the round of iteration, and is the upper summation bound in the formula; the square of the sum of the first derivatives of the loss function, H j the second derivative of the loss function, and lambda the regularization term can be used to avoid numerical instability when H j is too small.
Optionally, the solving unit is used for solving the prediction model and comprises the steps of solving the prediction model by solving the information Gain change before and after node splitting so as to predict the carbon emission of the electric power industry in a future period;
The calculation formula of the information Gain change is as follows:
Wherein Gain is the information Gain; The method is characterized in that the method is an objective function which is to be optimized in a prediction model and aims at an objective function value of a bj node, G L is a gradient sum of a left sub-node and represents an accumulated value of gradients on a left sub-tree after a current node is split, G R is a gradient sum of a right sub-node and represents an accumulated value of gradients on a right sub-tree after the current node is split, H L is a Hessian sum of the left sub-node and represents a sum of second derivatives and is used for adjusting update step length, lambda is a regularization term and is used for adjusting regularization parameters in a tree growth process, the model is kept simple by punishing complexity of split nodes, overfitting is avoided, and H R is a Hessian sum of the right sub-node and represents a sum of the second derivatives and is used for curvature control in the update process.
In yet another aspect, the present invention also provides a computing device comprising one or more processors;
A processor for executing one or more programs;
the method as described above is implemented when the one or more programs are executed by the one or more processors.
In yet another aspect, the present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed, implements a method as described above.
Compared with the prior art, the invention has the beneficial effects that:
The invention provides a power industry carbon emission prediction method based on XGBoost algorithm, which comprises the steps of obtaining data related to current carbon emission of a power industry, inputting the data related to the current carbon emission of the power industry into a prediction model, and solving the prediction model to predict the carbon emission of the power industry in a future period, wherein training samples of the prediction model are training sets and testing sets generated based on historical data related to the historical carbon emission of the power industry. According to the invention, XGBoost algorithm is adopted to construct a model for prediction, and the predicted carbon emission is more accurate.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of a decision tree employing XGBoost algorithm in the method of the present invention;
FIG. 3 is a diagram showing an example of power industry input data required for the method of the present invention;
Fig. 4 is a block diagram of the system of the present invention.
Detailed Description
The exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, however, the present invention may be embodied in many different forms and is not limited to the examples described herein, which are provided to fully and completely disclose the present invention and fully convey the scope of the invention to those skilled in the art. The terminology used in the exemplary embodiments illustrated in the accompanying drawings is not intended to be limiting of the invention. In the drawings, like elements/components are referred to by like reference numerals.
Unless otherwise indicated, terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. In addition, it will be understood that terms defined in commonly used dictionaries should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense.
Example 1:
The invention provides a power industry carbon emission prediction method based on XGBoost algorithm, as shown in figure 1, comprising the following steps:
Step 1, acquiring current data related to carbon emission in the power industry;
Step 2, inputting current data related to carbon emission of the electric power industry into a prediction model, and solving the prediction model to predict the carbon emission of the electric power industry in a future period;
the training samples of the predictive model are training sets and test sets generated based on historical data of the power industry history related to carbon emissions.
Wherein, the historical data of electric power industry history and carbon emission related includes:
carbon emission data, economic activity-related data, energy usage data, and policy information data.
Wherein the method further comprises:
generating a data set based on the historical data, and dividing the data set into a training set and a testing set according to a preset proportion;
The predictive model is trained based on the training set and the test set.
Wherein generating a dataset based on the historical data comprises:
cleaning the historical data to filter abnormal values and complement missing values, so as to obtain cleaning data;
and screening the cleaning data to obtain key variable data influencing carbon emission, and carrying out normalization processing on the key variable data influencing carbon emission to generate a data set.
Wherein training the predictive model based on the training set and the test set comprises:
Training XGBoost algorithm by using the training set to generate an initial prediction model, testing the performance of the initial prediction model by using the testing set, and adjusting model parameters of the prediction model according to a testing result to obtain the prediction model.
Training XGBoost the algorithm by using the training set to generate an initial prediction model, including:
And selecting an average value of the carbon emission in the power industry as an initial predicted value of the carbon emission to define an initial model, performing iterative training on the initial model based on XGBoost algorithm to generate an iterative model, establishing an objective function based on the iterative model, and generating the initial predicted model based on the objective function.
In the iterative training process, a decision tree is added in each iteration, and the latest predicted value is updated.
The output of the iterative model is as follows:
Wherein, For the predicted value of carbon emission, t is the number of weak classifiers, and f k(xi) is the predicted value of the kth weak classifier on the sample xi.
Wherein the objective function is as follows:
Wherein, Is an objective function, is a function to be optimized in a prediction model and aims at an objective function value of a bj node, y i is a model predicted value, y i t-1 is an actual value or a value at the previous moment, L is a loss function, which measures the difference between the model predicted value y i and the actual value or the value at the previous moment y i t-1, g i is a first derivative of the loss function, f t (x) is a predicted value of the model in the current t-th iteration, and h i is a second derivative of the loss function; Omega (f t) is a regularization term, in the decision tree model, omega (f t) enables the model to be simpler and more effective by adjusting the number and the weight of leaf nodes of the tree, and constant is a constant term.
Converting the objective function, and establishing an initial prediction model based on the converted objective function;
wherein the transformed objective function is as follows:
Wherein, T is the total dimension or the number of leaf nodes, represents the number of terms summed in the round of iteration, and is the upper summation bound in the formula; the square of the sum of the first derivatives of the loss function, H j the second derivative of the loss function, and lambda the regularization term can be used to avoid numerical instability when H j is too small.
Solving the prediction model, namely solving the prediction model by solving the Gain change of the information before and after node splitting so as to predict the carbon emission of the power industry in a future period;
The calculation formula of the information Gain change is as follows:
Wherein Gain is the information Gain; The method is characterized in that the method is an objective function which is to be optimized in a prediction model and aims at an objective function value of a bj node, G L is a gradient sum of a left sub-node and represents an accumulated value of gradients on a left sub-tree after a current node is split, G R is a gradient sum of a right sub-node and represents an accumulated value of gradients on a right sub-tree after the current node is split, H L is a Hessian sum of the left sub-node and represents a sum of second derivatives and is used for adjusting update step length, lambda is a regularization term and is used for adjusting regularization parameters in a tree growth process, the model is kept simple by punishing complexity of split nodes, overfitting is avoided, and H R is a Hessian sum of the right sub-node and represents a sum of the second derivatives and is used for curvature control in the update process.
The method for establishing the prediction model specifically comprises the following steps:
According to the method, the influence of economic activities, energy use and policy change on the power industry is comprehensively considered, the XGBoost algorithm is adopted to analyze historical data of the power industry in a long time scale, key influence factors of carbon emission in the power industry in a multidimensional manner are clarified, modeling analysis is carried out on influence weights of variables, so that a power industry carbon emission time sequence prediction model is constructed, and prediction in higher space precision and longer time scale is carried out on future carbon emission, and the method mainly comprises the following steps:
step one, defining an initial model and an iterative model.
And selecting an average value of the carbon emission related data of the existing power industry as an initial predicted value of the carbon emission. Simultaneously performing iterative training, and newly adding a decision tree into the model and updating the latest predicted value in each iterationThe specific iterative process is shown in formula (1):
Wherein: Is a predicted value of carbon emissions;
x i is the eigenvector of the carbon emission sample;
initial predicted value for carbon emissions;
f t(xi) is the predicted value of the t weak classifier on the sample xi;
finally, the predicted value of the model for a specific sample is obtained by accumulating the predicted results of all decision trees, and the output of the model results is as follows:
and step two, setting an objective function.
In the power industry carbon emission prediction process based on XGBoost algorithm, the objective function model error and the structural error are formed. The model error, also called loss function, is determined by the difference between the real value of the power industry data and the predicted value of the model, and the structural error acts to control the complexity of the model and prevent over-fitting. The objective function can be expressed by the formula (2):
substituting equation (1) into equation (2), the objective function may be further expressed as:
Assuming that the model error (loss function) is a square loss function and the final fitting iteration number is t, the prediction model comprises t decision trees, and the error of the model is composed of three parts of the sum of errors of the selected n data samples in the t decision trees, the structural error of the t decision trees and the structural error of the previous t-1 decision trees. Considering that the structure of the first t-1 decision tree is known, its structural error can be regarded as a constant, and thus, the objective function Can be further expressed as:
wherein, the variable f t can be expressed as:
with equation (4), the objective function Can be expressed as:
Wherein, AndRespectively areFirst and second derivatives of (a). When the model error (loss function) is assumed to be a square loss function,
Further, considering that L (y) represents the sum of errors between the data samples and the t-1 st decision tree, it can be considered as a constant term. Thus, the objective function may be expressed as a function of f t (x), while f t (x) is a function of the decision tree node output w, and further, the objective function may be converted entirely into a function of w, expressed as:
Order the The objective function may be converted to:
where j represents the j-th node and i represents the i-th sample. Considering that the objective function is a unitary quadratic function with respect to w (i.e. the leaf node branches), in order to optimize the objective function, the optimal solution of w needs to be further solved, and the derivative is obtained:
Substituting the w' solution set into the set, the objective function may be further transformed into:
the method can be used for solving the objective function of the binary decision tree under the conditions of obtaining the first derivative and the second derivative of the loss function and sample data distribution.
Further, the characteristic node splitting situation can be analyzed by solving the Gain change of the information before and after node splitting based on the objective function obtained by solving. If the current information gain is >0, the node is considered to split, and if the gain is not greater than 0, the node is not split. The information gain can be expressed as follows
It should be specifically noted that, in the process of predicting carbon emission in the power industry based on XGBoost algorithm, node splitting modes can be classified into node splitting based on greedy strategy and node splitting based on approximate strategy. The node splitting based on the greedy strategy is characterized in that the gain before and after splitting is calculated by searching each feature point and each possible value thereof in detail, and the feature value with the largest gain is selected as the splitting point, so that the method is accurate and has large calculated amount. In order to improve the calculation efficiency, in the solving process, a node splitting mode based on an approximate strategy is generally adopted, a characteristic barrel dividing thought is adopted, the values of the characteristics are distributed into different characteristic barrels according to the number of the digits, and only the boundary value of the barrel is considered as a candidate of a splitting node.
By combining the initial model, the iterative model and the objective function, the prior data related to the carbon emission in the power industry is input, and after multiple fitting and solving, the prediction of the future carbon emission in the power industry with higher spatial precision and longer time scale can be realized based on XGBoost algorithm.
The following description is given in specific cases, including the following steps:
fig. 1 is a flowchart showing an implementation of a method for predicting carbon emission in the electric power industry based on XGBoost algorithm and multidimensional dataset analysis according to the present invention, specifically including the following steps:
S101, carbon emission and multidimensional data collection. Key factors affecting carbon emissions include historical carbon emissions, economic indicators (e.g., GDP, industrial production values), energy usage (e.g., consumption of coal, natural gas, renewable energy), demographic data, and policies and regulations. The example uses reliable data of government statistical bureau, academic research, industry report and other sources, and the example selects the thermal power generation, the power generation increase percentage, the power generation accumulation value, the power generation current period value, the asset investment for the power industry, the financial budget and other month data with multiple dimensions from 1990 month to 12 months on the national statistical bureau network as the original data.
S102, preprocessing carbon emission data. Preprocessing generally comprises the following key links of 1) identifying the missing items in a data set, selecting to delete records containing missing values, or filling by using interpolation, average filling or other suitable methods, 2) analyzing historical data of the power industry in a long time scale, selecting a multi-dimensional key influence factor of carbon emission of the power industry as a characteristic variable, and 3) normalizing or standardizing the input carbon emission data of the power industry.
S103, dividing carbon emission data. The model requires the entire data set to be divided into a training set and a testing set to ensure that the model can be effectively evaluated on unseen data, avoiding overfitting. Common dividing ratios are 70% -90% for training and 10% -30% for testing, which can be adjusted according to the size and characteristics of a particular dataset. In this example, 10% of the fitting result is selected as the proportion of the test set. And the data set is divided by adopting tool functions such as train_test_split and the like so as to ensure that the training set and the test set have similarity in feature distribution. In the embodiment, a XGBoost algorithm is used for training the model, the model parameters are adjusted through repeated iteration, the loss function is minimized, and in the training process, cross verification is used for evaluating the generalization capability of the model, so that the best super-parameter setting is selected.
And S104, evaluating reliability of the carbon emission data test set. After training, the final performance of the model is evaluated by using a test set, and indexes such as mean square error (RMSE), R 2 and the like are used for ensuring the prediction capability of the model. The evaluation indexes selected in this example are Mean Square Error (MSE), root Mean Square Error (RMSE), mean Absolute Error (MAE) and Mean Absolute Percent Error (MAPE).
S105, predicting carbon emission in the future power industry. After model training is completed, a trained XGBoost model is used for predicting future data, new data is input, and corresponding prediction results are obtained.
S106, analyzing a carbon emission prediction result in the future power industry. And identifying the change trend of the carbon emission by carrying out time sequence analysis on the predicted result. The result obtained in this example is that the carbon emission no longer shows a significant trend of rising, the carbon emission is significantly alleviated, and it is also demonstrated that the carbon-to-peak carbon neutralization objective is being achieved gradually.
FIG. 2 is a schematic diagram of a decision tree of XGBoost algorithm used in the method for predicting carbon emissions in the electric power industry according to the present invention. Unlike traditional machine learning algorithms, XGBoost algorithm is mainly distinguished by the number of sub-decision tree models introduced during the calculation process, which is also the main reason that it can achieve higher spatial resolution and longer-scale carbon emission prediction. When XGBoost is used for data prediction, the accuracy of a prediction result is gradually improved through continuous iteration based on a gradient lifting algorithm. Meanwhile, in the training process, the data decision tree node obtained by each calculation is split into two sub-nodes, and each sub-node corresponds to a new decision tree model. Thus, each node in XGBoost is part of a decision tree model, rather than a separate decision tree. Therefore, the mode can effectively utilize the existing decision tree model, quicken the training speed of the model and improve the accuracy of the model.
Fig. 3 shows an example display of power industry input data required for developing power industry carbon emission predictions using the present invention. The design data of the model is derived from the national statistical office network. According to the requirements of model building, the input data not only comprises carbon emission, but also needs to cover other relevant dimensions which can influence carbon emission. Therefore, multi-dimensional month data from 1990 month 1 to 2020 month 12 on the national institutes of statistics are selected as raw data, including thermal power generation (HLFD), generation increase percentage (ZZBFL), generation accumulation value (FDLJZ), generation current value (FDDQZ), electric power industry asset investment (ZCTZ), financial budget (CZYS) and the like. These multidimensional data provide comprehensive inputs to the model, which helps to improve accurate predictions and analyses of carbon emissions.
According to the invention, the influence of economic activities, energy use and policy change on the power industry is comprehensively considered, the XGBoost algorithm is adopted to analyze the historical data of the power industry in a long time scale, the prediction of the future carbon emission in the power industry in a higher spatial precision and longer time scale can be realized, and more refined analysis and decision support are provided for the low-carbon transformation in the power industry.
Example 2:
The invention also provides a power industry carbon emission prediction system 200 based on XGBoost algorithm, as shown in fig. 4, comprising:
a data acquisition unit 201 for acquiring data related to carbon emissions currently in the power industry;
The solving unit 202 inputs the data related to the current carbon emission of the electric power industry into a prediction model, and solves the prediction model to predict the carbon emission of the electric power industry in a future period;
the training samples of the predictive model are training sets and test sets generated based on historical data of the power industry history related to carbon emissions.
Wherein, the historical data of electric power industry history and carbon emission related includes:
carbon emission data, economic activity-related data, energy usage data, and policy information data.
Wherein, the solving unit 202 is further configured to:
generating a data set based on the historical data, and dividing the data set into a training set and a testing set according to a preset proportion;
The predictive model is trained based on the training set and the test set.
Wherein generating a dataset based on the historical data comprises:
cleaning the historical data to filter abnormal values and complement missing values, so as to obtain cleaning data;
and screening the cleaning data to obtain key variable data influencing carbon emission, and carrying out normalization processing on the key variable data influencing carbon emission to generate a data set.
Wherein training the predictive model based on the training set and the test set comprises:
Training XGBoost algorithm by using the training set to generate an initial prediction model, testing the performance of the initial prediction model by using the testing set, and adjusting model parameters of the prediction model according to a testing result to obtain the prediction model.
Training XGBoost the algorithm by using the training set to generate an initial prediction model, including:
And selecting an average value of the carbon emission in the power industry as an initial predicted value of the carbon emission to define an initial model, performing iterative training on the initial model based on XGBoost algorithm to generate an iterative model, establishing an objective function based on the iterative model, and generating the initial predicted model based on the objective function.
In the iterative training process, a decision tree is added in each iteration, and the latest predicted value is updated.
The output of the iterative model is as follows:
Wherein, For the predicted value of carbon emission, t is the number of weak classifiers, and f k(xi) is the predicted value of the kth weak classifier on the sample xi.
Wherein the objective function is as follows:
Wherein, Is an objective function, is a function to be optimized in a prediction model and aims at an objective function value of a bj node, y i is a model predicted value, y i t-1 is an actual value or a value at the previous moment, L is a loss function, which measures the difference between the model predicted value y i and the actual value or the value at the previous moment y i t-1, g i is a first derivative of the loss function, f t (x) is a predicted value of the model in the current t-th iteration, and h i is a second derivative of the loss function; Omega (f t) is a regularization term, in the decision tree model, omega (f t) enables the model to be simpler and more effective by adjusting the number and the weight of leaf nodes of the tree, and constant is a constant term.
Converting the objective function, and establishing an initial prediction model based on the converted objective function;
wherein the transformed objective function is as follows:
Wherein, T is the total dimension or the number of leaf nodes, represents the number of terms summed in the round of iteration, and is the upper summation bound in the formula; the square of the sum of the first derivatives of the loss function, H j the second derivative of the loss function, and lambda the regularization term can be used to avoid numerical instability when H j is too small.
The solving unit 202 solves the prediction model, including solving the prediction model by solving the information Gain change before and after node splitting to predict the carbon emission of the electric power industry in the future period;
The calculation formula of the information Gain change is as follows:
Wherein Gain is the information Gain; The method is characterized in that the method is an objective function which is to be optimized in a prediction model and aims at an objective function value of a bj node, G L is a gradient sum of a left sub-node and represents an accumulated value of gradients on a left sub-tree after a current node is split, G R is a gradient sum of a right sub-node and represents an accumulated value of gradients on a right sub-tree after the current node is split, H L is a Hessian sum of the left sub-node and represents a sum of second derivatives and is used for adjusting update step length, lambda is a regularization term and is used for adjusting regularization parameters in a tree growth process, the model is kept simple by punishing complexity of split nodes, overfitting is avoided, and H R is a Hessian sum of the right sub-node and represents a sum of the second derivatives and is used for curvature control in the update process.
According to the invention, XGBoost algorithm is adopted to construct a model for prediction, and the predicted carbon emission is more accurate.
Example 3:
Based on the same inventive concept, the invention also provides a computer device comprising a processor and a memory for storing a computer program comprising program instructions, the processor for executing the program instructions stored by the computer storage medium. The processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processor, digital signal processor (DIGITAL SIGNAL Processor, DSP), application specific integrated circuit (Application SpecificIntegrated Circuit, ASIC), off-the-shelf Programmable gate array (Field-Programmable GATEARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc., which are the computational core and control core of the terminal adapted to implement one or more instructions, in particular to load and execute one or more instructions within a computer storage medium to implement the corresponding method flow or corresponding functions to implement the steps of the method in the embodiments described above.
Example 4:
Based on the same inventive concept, the present invention also provides a storage medium, in particular, a computer readable storage medium (Memory), which is a Memory device in a computer device, for storing programs and data. It is understood that the computer readable storage medium herein may include both built-in storage media in a computer device and extended storage media supported by the computer device. The computer-readable storage medium provides a storage space storing an operating system of the terminal. Also stored in the memory space are one or more instructions, which may be one or more computer programs (including program code), adapted to be loaded and executed by the processor. The computer readable storage medium herein may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. One or more instructions stored in a computer-readable storage medium may be loaded and executed by a processor to implement the steps of the methods in the above-described embodiments.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The scheme in the embodiment of the invention can be realized by adopting various computer languages, such as object-oriented programming language Java, an transliteration script language JavaScript and the like.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
Claims (24)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411492988.8A CN119539145A (en) | 2024-10-24 | 2024-10-24 | A method and system for predicting carbon emissions in the power industry based on XGBoost algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411492988.8A CN119539145A (en) | 2024-10-24 | 2024-10-24 | A method and system for predicting carbon emissions in the power industry based on XGBoost algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN119539145A true CN119539145A (en) | 2025-02-28 |
Family
ID=94702457
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202411492988.8A Pending CN119539145A (en) | 2024-10-24 | 2024-10-24 | A method and system for predicting carbon emissions in the power industry based on XGBoost algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN119539145A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN120087979A (en) * | 2025-05-06 | 2025-06-03 | 山东科技大学 | A method, device and medium for predicting carbon emissions based on capturing relationships between variables |
-
2024
- 2024-10-24 CN CN202411492988.8A patent/CN119539145A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN120087979A (en) * | 2025-05-06 | 2025-06-03 | 山东科技大学 | A method, device and medium for predicting carbon emissions based on capturing relationships between variables |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | Load probability density forecasting by transforming and combining quantile forecasts | |
CN107220851A (en) | Electricity sales amount Forecasting Methodology and device based on X13 seasonal adjustments and Cox regression | |
US11366806B2 (en) | Automated feature generation for machine learning application | |
CN108694470A (en) | A kind of data predication method and device based on artificial intelligence | |
CN118229119B (en) | Short-term load forecasting method, system and storage medium integrating time series decomposition and machine learning model | |
CN114169434A (en) | Load prediction method | |
CN119539145A (en) | A method and system for predicting carbon emissions in the power industry based on XGBoost algorithm | |
CN119298177B (en) | Photovoltaic energy storage scheduling method and system considering source-load uncertainty | |
Fan et al. | Multi-objective LSTM ensemble model for household short-term load forecasting | |
CN114970345A (en) | Method, device, device and readable storage medium for constructing short-term load forecasting model | |
CN115034473A (en) | A kind of electricity price prediction method, system and device | |
Liu et al. | Research and application of short-term load forecasting based on CEEMDAN-LSTM modeling | |
CN118609709A (en) | Parameter inversion and optimization method for reservoir numerical simulation based on intelligent agent | |
CN120106307A (en) | Intelligent prediction system of carbon emissions based on neural network | |
CN119250848A (en) | A real estate valuation method based on Bayesian optimization and machine learning | |
CN113111588A (en) | NO of gas turbineXEmission concentration prediction method and device | |
CN118113279A (en) | Power load prediction low-code slice construction method and system based on deep learning model | |
Wang et al. | LightGBM-BES-BiLSTM carbon price prediction based on environmental impact factors | |
Tronci et al. | Physics Informed Machine Learning Part I: Different Strategies to Incorporate Physics into Engineering Problems | |
Zhao et al. | A hybrid framework for short-term load forecasting based on optimized InMetra Boost and BiLSTM | |
CN118378761B (en) | Power grid data purification method, system, equipment and medium | |
CN120262409B (en) | New energy power generation prediction method and system based on improved LSTM model | |
Huang et al. | On Digital Economy Scales Prediction Technology based on QPSO and LSTM Model | |
CN120072139A (en) | Grouting material strength prediction method and system based on causal inference and machine learning | |
Keisler et al. | WindDragon: automated deep learning for regional wind power forecasting |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |