CN120013004A - A quantitative precipitation estimation method, system, storage medium and electronic device - Google Patents
A quantitative precipitation estimation method, system, storage medium and electronic device Download PDFInfo
- Publication number
- CN120013004A CN120013004A CN202510094673.6A CN202510094673A CN120013004A CN 120013004 A CN120013004 A CN 120013004A CN 202510094673 A CN202510094673 A CN 202510094673A CN 120013004 A CN120013004 A CN 120013004A
- Authority
- CN
- China
- Prior art keywords
- data
- distribution
- precipitation
- probability
- precipitation estimation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S13/00—Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
- G01S13/88—Radar or analogous systems specially adapted for specific applications
- G01S13/95—Radar or analogous systems specially adapted for specific applications for meteorological use
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01W—METEOROLOGY
- G01W1/00—Meteorology
- G01W1/14—Rainfall or precipitation gauges
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Mathematical Physics (AREA)
- Business, Economics & Management (AREA)
- Evolutionary Computation (AREA)
- Remote Sensing (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Radar, Positioning & Navigation (AREA)
- Mathematical Optimization (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Operations Research (AREA)
- Probability & Statistics with Applications (AREA)
- Pure & Applied Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Environmental & Geological Engineering (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Algebra (AREA)
- Game Theory and Decision Science (AREA)
- Development Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Hydrology & Water Resources (AREA)
- Databases & Information Systems (AREA)
- Atmospheric Sciences (AREA)
- Biodiversity & Conservation Biology (AREA)
Abstract
The invention provides a quantitative precipitation estimation method, a quantitative precipitation estimation system, a storage medium and electronic equipment, wherein the quantitative precipitation estimation method comprises the steps of obtaining a data set, constructing a priori distribution model based on the data set, defining approximate distribution of posterior distribution of precipitation estimation according to the priori distribution model based on variation reasoning, determining lower bound of precipitation estimation according to the approximate distribution based on Jessen inequality, and maximizing the lower bound of precipitation estimation to obtain quantitative precipitation estimation results. According to the method, firstly, a radar data set of a quantitative precipitation estimation area and an automatic station live observation information data set are constructed, a priori distribution model is constructed based on the data set, based on variation reasoning, the approximate distribution of precipitation estimation posterior distribution is defined according to the priori distribution model, the lower bound of precipitation estimation is determined according to the approximate distribution based on the Jessen inequality, the lower bound of precipitation estimation is maximized, and a quantitative precipitation estimation result is obtained, so that quantitative precipitation estimation is carried out, and service application performance of the quantitative precipitation estimation is improved.
Description
Technical Field
The invention relates to the technical field of weather prediction, in particular to a quantitative precipitation estimation method, a quantitative precipitation estimation system, a storage medium and electronic equipment.
Background
At present, quantitative precipitation estimation (Quantitative Precipitation Estimation, QPE) is the basis of services such as quantitative precipitation prediction QPF, strong precipitation approach warning and the like, is an important component of short-time approach prediction, and is always an important point and difficulty in prediction services. Automatic weather stations are the most direct way to observe precipitation at present. Due to uneven spatial distribution of the automatic weather station, the observed data can not completely reflect the distribution characteristics of precipitation.
Radar reflectivity factors are the primary factors affecting precipitation, but precipitation is affected by a number of factors. How to screen out factors affecting the accuracy of quantitative precipitation estimation to obtain precipitation estimation which is optimized as much as possible is a difficult problem of quantitative precipitation estimation at present.
Thus, a solution is needed.
Disclosure of Invention
The invention aims to provide a quantitative precipitation estimation method, which is based on long-time massive observation data and adopts a quantitative precipitation estimation algorithm based on non-parameter Bayesian machine learning. The algorithm is combined with 9 elements such as radar reflectivity factors, automatic station information, radar inversion wind fields, weather types and the like to constrain a radar quantitative precipitation estimation algorithm, firstly, a radar data set of a quantitative precipitation estimation area and an automatic station live observation information data set are constructed, an priori distribution model is constructed based on the data set, an approximate distribution of precipitation estimation posterior distribution is defined based on a variational reasoning according to the priori distribution model, a lower bound of precipitation estimation is determined based on a Jessen inequality according to the approximate distribution, the lower bound of precipitation estimation is maximized, and a quantitative precipitation estimation result is obtained, so that quantitative precipitation estimation is carried out, and service application performance of the quantitative precipitation estimation is improved.
The quantitative precipitation estimation method provided by the embodiment of the invention comprises the following steps:
acquiring a data set M;
constructing a priori distribution model p (M|D) based on the data set M, wherein D is a family of models;
based on the variational reasoning, defining the approximate distribution q (theta) of precipitation estimation posterior distribution according to a priori distribution model p (M|D), wherein theta is a probability model parameter;
Determining a lower bound E q[log(p(Θ,M))]-Eq [ log (q (Θ)) ] of precipitation estimation according to the approximate distribution q (Θ) based on the Jessen inequality, wherein E q is a precipitation estimation data distribution interval;
maximizing the lower bound E q[log(p(Θ,M))]-Eq [ log (q (Θ)) ] of precipitation estimation to obtain quantitative precipitation estimation results
Optionally, the data set M at least comprises a radar data set of a quantitative precipitation estimation area and an automatic station information data set, wherein the radar data set is 1Km x 1Km radar reflectivity factor grid data in the quantitative precipitation estimation area, and the automatic station information data set comprises longitude and latitude position coordinates and precipitation live observation data of each automatic station in the quantitative precipitation estimation area.
Optionally, the constructing the prior distribution model p (m|d) based on the data set M includes:
g 0 in the Dirichlet non-parametric bayesian model is a basic probability distribution on the probability measure space Ω, the concentration parameter α 0 >0, and if the probability distribution G on the probability measure space Ω obeys the basic probability distribution G 0:
G~DP(α0,G0)
Wherein, the basic probability distribution G 0 determines the distribution of basic constituent elements in the prior distribution model p (M|D), and DP is a Dirichlet process;
Based on the Dirichlet process hybrid model DPM, a generation probability is increased for each data point in a given amount of precipitation estimation area as a priori distribution of data:
mi~p(m|θi)
wherein, the parameter theta i obeys the probability distribution G, i epsilon N is a set from 1 to the total number N of data points, each data point generates a probability parameter, N is the total number of data points, p is a conditional probability density function, m is the precipitation estimated value of the data point;
the optimal model is chosen as the prior distribution model p (m|d) by comparing the likelihood functions of the different families of prior distribution probability models M i~p(m|θi):
p(M|D)=∫Θ(M|Θ)p(Θ|D)
wherein D represents a family of models.
Optionally, the selecting the optimal model as the prior distribution model p (m|d) by comparing likelihood functions of different family models includes:
Setting the data of a data set M= { M 1,m2,m3…mn } to be independent, and reading n automatic station live precipitation observation data in a quantitative precipitation estimation area, wherein the n automatic station live precipitation observation data are arranged randomly to obtain { F (i) }, wherein i=1, 2,..n;
Is provided with Ω=Ω t-1, sampling the indicator factor β i of each automatic station live precipitation observation data i e { F (1), F (2)..f (n) }, n randomly arranged automatic station live precipitation observation data formed based on the function { F (i) }, wherein,Omega t-1 is the probability measure space of the time t-1;
Likelihood estimates f k(mi) of the observed data based on the current K clusters,
fk(mi)=p(mi|βi=k,M\i,ζ)
Wherein, beta i is a new category, M \i is the data of the corresponding corner mark is removed from the data set M= { M 1,m2,m3…mn }, ζ is a distribution parameter, k is an observation value, and M i is a random variable; similar to k, another observation that represents a non-k;
Beta i is sampled:
in the formula, For the amount of data already in class K, E i is a preset observation data sample, K is the number of observation data samples, f k represents the probability density function for class K, delta is the Cronecker delta function when β i =k, delta (β i, K) =1, otherwise 0; To represent probability density functions for other than the kth class;
If it is Then the number of clusters K is increased by 1, k=k+1;
Checking the observed data quantity of various clustering calculation likelihood functions, if the total number of the observed data of one type is 0, deleting the corresponding type, and reducing the clustering quantity K by 1, wherein K=K-1.
The quantitative precipitation estimation system provided by the embodiment of the invention comprises:
an acquisition module for acquiring a data set M;
The construction module is used for constructing a priori distribution model p (M|D) based on the data set M, wherein D is a family of models;
the definition module is used for defining the approximate distribution q (theta) of precipitation estimation posterior distribution according to the prior distribution model p (M|D) based on variation reasoning, wherein theta is a probability model parameter;
A determining module for determining a lower bound E q[log(p(Θ,M))]-Eq [ log (q (Θ)) ] of the precipitation estimate based on the Jessen inequality according to an approximate distribution q (Θ), wherein E q is a region of the precipitation estimate distribution;
the maximizing module is used for maximizing the lower limit E q[log(p(Θ,M))]-Eq [ log (q (Θ)) ] of precipitation estimation to obtain quantitative precipitation estimation results
Optionally, the data set M at least comprises a radar data set of a quantitative precipitation estimation area and an automatic station information data set, wherein the radar data set is 1Km x 1Km radar reflectivity factor grid data in the quantitative precipitation estimation area, and the automatic station information data set comprises longitude and latitude position coordinates and precipitation live observation data of each automatic station in the quantitative precipitation estimation area.
Optionally, the constructing the prior distribution model p (m|d) based on the data set M includes:
g 0 in the Dirichlet non-parametric bayesian model is a basic probability distribution on the probability measure space Ω, the concentration parameter α 0 >0, and if the probability distribution G on the probability measure space Ω obeys the basic probability distribution G 0:
G~DP(α0,G0)
Wherein, the basic probability distribution G 0 determines the distribution of basic constituent elements in the prior distribution model p (M|D), and DP is a Dirichlet process;
Based on the Dirichlet process hybrid model DPM, a generation probability is increased for each data point in a given amount of precipitation estimation area as a priori distribution of data:
mi~p(m|θi)
wherein, the parameter theta i obeys the probability distribution G, i epsilon N is a set from 1 to the total number N of data points, each data point generates a probability parameter, N is the total number of data points, p is a conditional probability density function, m is the precipitation estimated value of the data point;
the optimal model is chosen as the prior distribution model p (m|d) by comparing the likelihood functions of the different families of prior distribution probability models M i~p(m|θi):
p(M|D)=∫Θ(M|Θ)p(Θ|D)
wherein D represents a family of models.
Optionally, the selecting the optimal model as the prior distribution model p (m|d) by comparing likelihood functions of different family models includes:
Setting the data of a data set M= { M 1,m2,m3…mn } to be independent, and reading n automatic station live precipitation observation data in a quantitative precipitation estimation area, wherein the n automatic station live precipitation observation data are arranged randomly to obtain { F (i) }, wherein i=1, 2,..n;
Is provided with Ω=Ω t-1, sampling the indicator factor β i of each automatic station live precipitation observation data i e { F (1), F (2)..f (n) }, n randomly arranged automatic station live precipitation observation data formed based on the function { F (i) }, wherein,Omega t-1 is the probability measure space of the time t-1;
Likelihood estimates f k(mi) of the observed data based on the current K clusters,
fk(mi)=p(mi|βi=k,M\i,ζ)
Wherein, beta i is a new category, M \i is the data of the corresponding corner mark is removed from the data set M= { M 1,m2,m3…mn }, ζ is a distribution parameter, k is an observation value, and M i is a random variable; similar to k, another observation that represents a non-k;
Beta i is sampled:
in the formula, For the amount of data already in class K, E i is a preset observation data sample, K is the number of observation data samples, f k represents the probability density function for class K, delta is the Cronecker delta function when β i =k, delta (β i, K) =1, otherwise 0; To represent probability density functions for other than the kth class;
If it is Then the number of clusters K is increased by 1, k=k+1;
Checking the observed data quantity of various clustering calculation likelihood functions, if the total number of the observed data of one type is 0, deleting the corresponding type, and reducing the clustering quantity K by 1, wherein K=K-1.
The embodiment of the invention provides a computer readable storage medium, on which a computer program is stored, and a processor executes the computer program to implement the method of any one of the above embodiments.
The electronic device provided by the embodiment of the invention comprises a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to realize the method of any one of the above.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
FIG. 1 is a schematic diagram of a quantitative precipitation estimation method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a quantitative precipitation estimation system according to an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
The embodiment of the invention provides a quantitative precipitation estimation method, as shown in fig. 1, comprising the following steps:
S1, acquiring a data set M;
S2, constructing a priori distribution model p (M|D) based on a data set M, wherein D is a family of models;
S3, based on variation reasoning, defining an approximate distribution q (theta) of precipitation estimation posterior distribution according to a priori distribution model p (M|D), wherein theta is a probability model parameter;
S4, determining a lower boundary E q[log(p(Θ,M))]-Eq [ log (q (Θ)) ] of precipitation estimation according to the approximate distribution q (Θ) based on the Jessen inequality, wherein E q is a precipitation estimation data distribution interval;
s5, maximizing a lower boundary E q[log(p(Θ,M))]-Eq [ log (q (Θ)) ] of precipitation estimation to obtain a quantitative precipitation estimation result
The data set M at least comprises a radar data set of a quantitative precipitation estimation area and an automatic station information data set, wherein the radar data set is 1Km x 1Km radar reflectivity factor grid data in the quantitative precipitation estimation area, and the automatic station information data set comprises longitude and latitude position coordinates and precipitation live observation data of each automatic station in the quantitative precipitation estimation area.
The constructing a priori distribution model p (m|d) based on the data set M, includes:
g 0 in the Dirichlet non-parametric bayesian model is a basic probability distribution on the probability measure space Ω, the concentration parameter α 0 >0, and if the probability distribution G on the probability measure space Ω obeys the basic probability distribution G 0:
G~DP(α0,G0)
Wherein, the basic probability distribution G 0 determines the distribution of basic constituent elements in a prior distribution model p (M|D), the DP is a Dirichlet process, wherein, the 'DIRICHLET PROCESS' (Dirichlet process) is a random process widely used in Bayesian non-parametric statistics, and the method is used for modeling the precipitation distribution probability without knowing the specific form of precipitation distribution but having some prior knowledge;
Based on the Dirichlet process hybrid model DPM, a generation probability is increased for each data point in a given amount of precipitation estimation area as a priori distribution of data:
mi~p(m|θi)
Wherein the parameter theta i obeys a probability distribution G, i epsilon N is a set from 1 to the total number N of data points, each data point generates a probability parameter, N is the total number of the data points, p is a conditional probability density function (probability density function, PDF), m is the precipitation estimated value of the data point, and the expression represents the probability distribution of the precipitation estimated data under the condition of the given parameter;
the optimal model is chosen as the prior distribution model p (m|d) by comparing the likelihood functions of the different families of prior distribution probability models M i~p(m|θi):
p(M|D)=∫Θ(M|Θ)p(Θ|D)
wherein D represents a family of models.
The selecting the optimal model as the prior distribution model p (m|d) by comparing likelihood functions of different family models includes:
Setting the data of a data set M= { M 1,m2,m3…mn } to be independent, reading n automatic station live precipitation observation data in a quantitative precipitation estimation area, and randomly arranging the n automatic station live precipitation observation data to obtain { F (i) }, wherein i=1, 2..n;
Is provided with Ω=Ω t-1, sampling the indicator factor β i of each automatic station live precipitation observation data i e { F (1), F (2)..f (n) }, n randomly arranged automatic station live precipitation observation data formed based on the function { F (i) }, wherein,The method comprises the steps of collecting parameters at t-1 time, wherein omega t-1 is a probability measure space at t-1 time, generating random arrangement based on a specific function for each automatic station live precipitation observation data, sampling to form corresponding indication factors, and the indication factors represent the collecting parameters (such as weather parameters, precipitation values and the like) at the current time, wherein the probability measure space at the time is included.
Likelihood estimates f k(mi) of the observed data based on the current K clusters,
fk(mi)=p(mi|βi=k,M\i,ζ)
Wherein β i is a new class, M \i is removing data (i.e., k in the formula) corresponding to the corner mark (unique index corresponding to each data point in the data set) from the data set m= { M 1,m2,m3…mn }, ζ is a distribution parameter, k is an observation value, and M i is a random variable; Similarly to k, another observation value other than k is represented, and for each current cluster, a likelihood estimate of its observation is calculated. Namely, the fitting degree of the data is evaluated based on the current data set and the clustering model. And gradually fitting different models by setting the number of clusters and sample data, and calculating the likelihood function of each cluster. And (3) calculating related parameters in the formula, such as clustering parameters, observation values, random variables and the like of the data, and finally obtaining the likelihood function value of each cluster. By comparing likelihood functions of different families of models, the model with the highest likelihood value is selected as the optimal model, and is used as the prior distribution model.
Beta i is sampled:
in the formula, For the amount of data already in class K, E i is a preset observation data sample, K is the number of observation data samples, f k represents a probability density function for class K, δ is a Kronecker δ function, when β i =k, δ (β i, K) =1, otherwise 0; To represent probability density functions for other than the kth class;
If it is Then the number of clusters K is increased by 1, k=k+1;
Checking the observed data quantity of various clustering calculation likelihood functions, if the total number of the observed data of one type is 0, deleting the corresponding type, and reducing the clustering quantity K by 1, wherein K=K-1.
Sampling is carried out in the clustering process, and whether the clustering quantity needs to be adjusted is judged. If the amount of data in a cluster is 0, the cluster is deleted and the number of clusters is reduced. The adjustment of the number of clusters can improve the estimation precision, and each cluster can effectively represent the distribution condition of data. When the number of clusters reaches a preset condition, the number of clusters is increased by 1 so as to further refine classification. All clusters are calculated and checked to ensure that the total number of observation data of the likelihood function of each cluster is proper. During data sampling, a kronecker function (Kronecker function) is used to represent the relationship between different data categories. If the number of clusters is still not satisfactory (e.g., a certain class is empty), the class needs to be deleted and the number of clusters reduced. By dynamically adjusting the clustering quantity, the clustering result of the precipitation data can be ensured to be more in line with the distribution of the actual data. After multiple iterations, when the clustering number and likelihood function reach the preset precision, finally outputting the optimal precipitation estimation result.
The working principle and the beneficial effects of the technical scheme are as follows:
Bayesian machine learning is an important branch of machine learning. Bayesian methods were first proposed by the uk math, thomas bayes. Through the development of over two hundred years, the method becomes an important component of machine learning, and is widely applied to the fields of statistical machine learning such as multivariable structured output prediction and the like. The basic concept of bayesian theorem is that given a priori distribution and likelihood function of a model, the posterior probability distribution p (Θ|m) of the model can be derived from bayesian formulas:
in the formula (1), Θ is a probability model parameter, M is a data set, the application comprises a radar data set, an automatic station information data set, p 0 (Θ) is a priori distribution function of the model, p (M|Θ) is a likelihood function, and p (M) is a constant.
The choice of model is a fundamental problem of bayesian approaches. The Dirichlet non-parameter Bayes model can obtain the number of the clustering centers through the characteristics of the data and the machine automatic learning.
The Dirichlet non-parametric Bayesian model assumes G 0 is a random probability distribution over a probability measure space Ω, the concentration parameter α 0 >0, the probability distribution G over space Ω obeys the base probability distribution G 0:
G~DP(α0,G0)(2)
Wherein the basic distribution G 0 determines the distribution of basic elements in the model.
However, the probability distribution obtained by the Dirichlet process is discrete, and in order to cluster different sets of data with a certain similarity, a Dirichlet process hybrid model DPM (Dirichlet Process Mixture) (antoniak 1974) is introduced, and by adding a generation probability to each data point, the probability is used as the prior distribution of the data:
mi~p(m|θi)(3)
In the formula (3), the parameter theta i is subjected to G distribution, i epsilon N is a generation probability increased by generating each data point, and N is the data point data. When G obeys the Dirichlet process distribution, the model is called Dirichlet process mixture model. The bayesian method selects the optimal model by comparing likelihood functions of different families of models:
p(M|D)=∫Θ(M|Θ)p(Θ|D) (4)
in equation (4), D represents a family of models, and over-fitting of the models can be avoided by (4) integration, assuming that p (Θ|D) is uniformly distributed without a significant prior function.
In the study, the data of the observation set M= { M 1,m2,m3…mn } is assumed to be independent, in order to obtain the indication factor of each observation, in a non-parametric Bayesian model using a Dirichlet process as a priori distribution, likelihood functions of different family models are obtained by using Gibbs sampling to select an optimal model, and the steps are as follows:
(1) Data initialization, namely reading data by a system, and randomly arranging n pieces of observation data to obtain { F (i) }, wherein i=1, 2,..n.
(2) Category clustering, settingΩ=Ω t-1, for each observation i e { F (1), F (2)..f (n) }, the indicator factor β i for each data is sampled.
Likelihood estimates for the observed data are computed based on the existing K clusters:
fk(mi)=p(mi|βi=k,M\i,ζ) (5)
In formula (5), formula (6), β i is a new category, M \i represents the removal of the data of the corresponding corner mark from the corresponding observation dataset, and ζ is the distribution parameter.
Beta i is sampled:
In the formula (7), in the formula (8), Is the amount of data already in class k. If it isThe number of clusters is increased by one, k=k+1.
(3) Cluster updating, namely checking the quantity of various observed data. If the total number of observed data of a certain class is 0, the class is deleted, and the number of clusters is reduced by one, wherein K=K-1.
The reasoning method of the Bayesian model is important content in Bayesian learning. Given the prior distribution, the posterior distribution of bayesian models is often insoluble, requiring efficient inference methods. In the application, the prediction is performed by adopting the variation reasoning.
In quantitative precipitation estimation prediction, given a data set M, a priori distribution model p (M|D), a variation method is adopted to define an approximate distribution q (Θ) of precipitation estimation posterior distribution. Using the jessen inequality, one can get a lower bound on precipitation estimates:
logp(M)≥Eq[log(p(Θ,M))]-Eq[log(q(Θ))] (9)
By maximizing the estimate lower bound:
the solution of quantitative precipitation estimation can be completed.
To make a quantitative precipitation estimate, it is first necessary to collect some actual data related to precipitation, the data set consisting of a radar data set and an automatic station information data set. The radar dataset contains radar reflectivity factor data within the region of reduced water. The spatial resolution of the radar data is 1km by 1km. These radar reflectivity data can be used to estimate the precipitation distribution, since there is a relationship between the radar signal and the precipitation. The automatic station information data set comprises automatic weather station data in a quantitative precipitation estimation area, and longitude and latitude information of each station and precipitation observation data of corresponding time points are recorded. These live observations serve as references for actual precipitation, serving as verification and auxiliary estimation.
In order to be able to infer an accurate precipitation estimate from the acquired data, a prior distribution model needs to be constructed in order to set prior information of the precipitation based on the existing data and background knowledge. The Dirichlet Process (DP) is a non-parametric bayesian model that can be used to model distributions with an unknown number of populations. In quantitative precipitation estimation, the Dirichlet process is used to model different types of precipitation processes. Through the Dirichlet process, a hybrid model can be constructed, the data divided into classes, and a probability assigned to each class. In the present application, DPM is used to model the a priori distribution of precipitation. In particular, the distribution of data points can be considered as being generated from a mixture model, with the parameters of each mixture component being unknown. The distribution is thus defined by the Dirichlet procedure, providing a priori distribution information for the estimation of the precipitation. In DPM, a priori distribution is used to describe the distribution morphology of precipitation. By constructing prior distribution, the model can be updated when new observation data exists, so that the actual precipitation distribution is gradually approximated.
After the prior distribution model is established, posterior reasoning is carried out. Since in practical cases, the posterior distribution cannot be directly calculated, it is necessary to approximate calculation by a method of variational reasoning. The variational reasoning is an optimization method for solving approximate solutions of complex probability distributions. In precipitation estimation problems, the objective of variational reasoning is to approximate a complex posterior distribution by constructing a simple approximation distribution. In particular, the present application uses variational reasoning to infer the posterior distribution of precipitation from a priori distribution. In the reasoning process, the objective of variational reasoning is to minimize the KL divergence of the approximate posterior distribution by optimizing a set of parameters so as to approximate the true posterior distribution as accurately as possible.
To further optimize the estimation results and to ensure the validity of the model for a given data, the present application introduces the jessen inequality to calculate the lower bound. The Jensen inequality (Jensen's Inequality) is a mathematical tool in bayesian inference that provides a method of approximating an objective function by optimizing a lower bound. In the present application, the jessen inequality is used to establish the lower bound for precipitation estimation. By calculating the lower bound it is ensured that the estimate of the posterior distribution does not deviate too far from the true value, and the maximization of this lower bound also allows to find the optimal precipitation estimate in the multiple solution space.
After the lower bound is calculated, the process of the lower bound is maximized. By maximizing the lower bound, a quantitative precipitation estimate can be obtained that is as accurate as possible. Specifically, for a given dataset, an objective function is derived based on the variational reasoning and the jessen inequality, which function represents the magnitude of the lower bound. And optimizing the objective function to maximize the lower bound, and finally obtaining the optimal precipitation estimation result. It is often necessary to estimate likelihood functions for the data points and to make precipitation estimates by selecting an optimal model. The likelihood function describes the probability of occurrence of the observed data given the model parameters. In precipitation estimation, likelihood functions are used to calculate the difference between the precipitation data and the model predictions. By maximizing the likelihood function, precipitation can be estimated more accurately. In actual data, a plurality of different precipitation types often appear, so that the data are grouped by adopting a clustering method, the distribution characteristics of the precipitation of different types can be better understood, and more accurate estimated values can be obtained through sampling calculation.
Through the steps, the quantitative precipitation estimation result can be finally obtained. The whole process combines the methods of Bayesian reasoning, variational reasoning, jessen inequality and the like, and ensures the accuracy and reliability of the estimation result. The model is continuously optimized and approximates the real precipitation amount by means of a priori distribution model, variation reasoning, lower bound maximization and the like, and finally high-precision precipitation estimation is achieved. The application has wide application prospect in the meteorological field, in particular to the aspects of extreme weather early warning, climate research and the like.
The quantitative precipitation estimation method and system of the invention adopt a quantitative precipitation estimation algorithm based on non-parameter Bayesian machine learning based on long-time massive observation data. The algorithm is combined with 9 elements such as radar reflectivity factors, automatic station information, radar inversion wind fields, weather types and the like to constrain a radar quantitative precipitation estimation algorithm, firstly, a radar data set of a quantitative precipitation estimation area and an automatic station live observation information data set are constructed, an priori distribution model is constructed based on the data set, an approximate distribution of precipitation estimation posterior distribution is defined based on a variational reasoning according to the priori distribution model, a lower bound of precipitation estimation is determined based on a Jessen inequality according to the approximate distribution, the lower bound of precipitation estimation is maximized, and a quantitative precipitation estimation result is obtained, so that quantitative precipitation estimation is carried out, and service application performance of the quantitative precipitation estimation is improved.
The quantitative precipitation estimation system provided by the embodiment of the invention, as shown in fig. 2, comprises:
An acquisition module 1 for acquiring a data set M;
the construction module 2 is used for constructing a priori distribution model p (M|D) based on the data set M, wherein D is a family of models;
The definition module 3 is used for defining the approximate distribution q (theta) of precipitation estimation posterior distribution according to the prior distribution model p (M|D) based on variation reasoning, wherein theta is a probability model parameter;
A determining module 4, configured to determine a lower bound E q[log(p(Θ,M))]-Eq [ log (q (Θ)) ] of the precipitation estimate according to the approximate distribution q (Θ) based on the Jessen inequality, where E q is a region of the precipitation estimate distribution;
the maximizing module 5 is used for maximizing the lower boundary E q[log(p(Θ,M))]-Eq [ log (q (Θ)) ] of precipitation estimation to obtain quantitative precipitation estimation results
The data set M at least comprises a radar data set of a quantitative precipitation estimation area and an automatic station information data set, wherein the radar data set is 1Km x 1Km radar reflectivity factor grid data in the quantitative precipitation estimation area, and the automatic station information data set comprises longitude and latitude position coordinates and precipitation live observation data of each automatic station in the quantitative precipitation estimation area.
The constructing a priori distribution model p (m|d) based on the data set M, includes:
g 0 in the Dirichlet non-parametric bayesian model is a basic probability distribution on the probability measure space Ω, the concentration parameter α 0 >0, and if the probability distribution G on the probability measure space Ω obeys the basic probability distribution G 0:
G~DP(α0,G0)
Wherein, the basic probability distribution G 0 determines the distribution of basic constituent elements in the prior distribution model p (M|D), and DP is a Dirichlet process;
Based on the Dirichlet process hybrid model DPM, a generation probability is increased for each data point in a given amount of precipitation estimation area as a priori distribution of data:
mi~p(m|θi)
wherein, the parameter theta i obeys the probability distribution G, i epsilon N is a set from 1 to the total number N of data points, each data point generates a probability parameter, N is the total number of data points, p is a conditional probability density function, m is the precipitation estimated value of the data point;
the optimal model is chosen as the prior distribution model p (m|d) by comparing the likelihood functions of the different families of prior distribution probability models M i~p(m|θi):
p(M|D)=∫Θ(M|Θ)p(Θ|D)
wherein D represents a family of models.
The selecting the optimal model as the prior distribution model p (m|d) by comparing likelihood functions of different family models includes:
Setting the data of a data set M= { M 1,m2,m3…mn } to be independent, and reading n automatic station live precipitation observation data in a quantitative precipitation estimation area, wherein the n automatic station live precipitation observation data are arranged randomly to obtain { F (i) }, wherein i=1, 2,..n;
Is provided with Ω=Ω t-1, sampling the indicator factor β i of each automatic station live precipitation observation data i e { F (1), F (2)..f (n) }, n randomly arranged automatic station live precipitation observation data formed based on the function { F (i) }, wherein,Omega t-1 is the probability measure space of the time t-1;
Likelihood estimates f k(mi) of the observed data based on the current K clusters,
fk(mi)=p(mi|βi=k,M\i,ζ)
Wherein, beta i is a new category, M \i is the data of the corresponding corner mark is removed from the data set M= { M 1,m2,m3…mn }, ζ is a distribution parameter, k is an observation value, and M i is a random variable; similar to k, another observation that represents a non-k;
Beta i is sampled:
in the formula, For the amount of data already in class K, E i is a preset observation data sample, K is the number of observation data samples, f k represents the probability density function for class K, delta is the Cronecker delta function when β i =k, delta (β i, K) =1, otherwise 0; To represent probability density functions for other than the kth class;
If it is Then the number of clusters K is increased by 1, k=k+1;
Checking the observed data quantity of various clustering calculation likelihood functions, if the total number of the observed data of one type is 0, deleting the corresponding type, and reducing the clustering quantity K by 1, wherein K=K-1.
The embodiment of the invention provides a computer readable storage medium, on which a computer program is stored, and a processor executes the computer program to implement the method of any one of the above embodiments.
The electronic device provided by the embodiment of the invention comprises a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to realize the method of any one of the above.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510094673.6A CN120013004A (en) | 2025-01-21 | 2025-01-21 | A quantitative precipitation estimation method, system, storage medium and electronic device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510094673.6A CN120013004A (en) | 2025-01-21 | 2025-01-21 | A quantitative precipitation estimation method, system, storage medium and electronic device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN120013004A true CN120013004A (en) | 2025-05-16 |
Family
ID=95660154
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202510094673.6A Pending CN120013004A (en) | 2025-01-21 | 2025-01-21 | A quantitative precipitation estimation method, system, storage medium and electronic device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN120013004A (en) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170075034A1 (en) * | 2015-09-10 | 2017-03-16 | The Climate Corporation | Generating probabilistic estimates of rainfall rates from radar reflectivity measurements |
| CN109344999A (en) * | 2018-09-07 | 2019-02-15 | 华中科技大学 | A Probabilistic Prediction Method of Runoff |
| CN112612995A (en) * | 2021-03-08 | 2021-04-06 | 武汉理工大学 | Multi-source rainfall data fusion algorithm and device based on Bayesian regression |
| CN116565840A (en) * | 2023-04-20 | 2023-08-08 | 湖南大学 | A high-precision wind speed soft-sensing method for wind power prediction in wind farms |
-
2025
- 2025-01-21 CN CN202510094673.6A patent/CN120013004A/en active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170075034A1 (en) * | 2015-09-10 | 2017-03-16 | The Climate Corporation | Generating probabilistic estimates of rainfall rates from radar reflectivity measurements |
| CN109344999A (en) * | 2018-09-07 | 2019-02-15 | 华中科技大学 | A Probabilistic Prediction Method of Runoff |
| CN112612995A (en) * | 2021-03-08 | 2021-04-06 | 武汉理工大学 | Multi-source rainfall data fusion algorithm and device based on Bayesian regression |
| CN116565840A (en) * | 2023-04-20 | 2023-08-08 | 湖南大学 | A high-precision wind speed soft-sensing method for wind power prediction in wind farms |
Non-Patent Citations (4)
| Title |
|---|
| SHENGCHAO CHEN 等: "TempEE: Temporal–Spatial Parallel Transformer for Radar Echo Extrapolation Beyond Autoregression", 《IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING》, vol. 61, 4 September 2023 (2023-09-04), pages 5108914 * |
| WENYUAN LI 等: "StarNet: A Deep Learning Model for Enhancing Polarimetric Radar Quantitative Precipitation Estimation", 《 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING》, vol. 62, 11 July 2024 (2024-07-11), pages 4106513 * |
| 陈训来 等: "基于雷暴尺度集合预报的频率匹配降水预报研究", 《气象与环境科学》, vol. 45, no. 6, 15 November 2022 (2022-11-15), pages 9 - 17 * |
| 高歆 等: "面向稀疏降水站点的套合各向异性贝叶斯地统计估计研究", 《地球信息科学学报》, vol. 24, no. 8, 18 July 2022 (2022-07-18), pages 1445 - 1458 * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN117116382B (en) | Method and system for spatial and temporal prediction of water quality of lakes affected by water diversion projects | |
| CN108428017B (en) | Wind power interval prediction method based on kernel extreme learning machine quantile regression | |
| CN114254561A (en) | Waterlogging prediction method, waterlogging prediction system and storage medium | |
| CN110705760A (en) | A photovoltaic power generation power prediction method based on deep belief network | |
| CN110597873A (en) | Precipitation data estimation method, device, equipment and storage medium | |
| CN116454875A (en) | Method and system for medium-term power probability prediction method and system of regional wind farms based on cluster division | |
| CN107886160B (en) | BP neural network interval water demand prediction method | |
| CN117526274A (en) | New energy power prediction method, electronic equipment and storage medium in extreme climate | |
| CN117078048A (en) | Digital twinning-based intelligent city resource management method and system | |
| CN117408394B (en) | Carbon emission factor prediction method and device for electric power system and electronic equipment | |
| CN117787110A (en) | Soil moisture inversion method and system based on deep learning model | |
| CN117521907A (en) | Photovoltaic power generation power interval prediction method considering photovoltaic output and meteorological elements | |
| Ferro | A probability model for verifying deterministic forecasts of extreme events | |
| CN115169089B (en) | Wind power probability prediction method and device based on kernel density estimation and copula | |
| CN119782859A (en) | A monitoring method and system for water conservancy projects | |
| CN113095579B (en) | Daily-scale rainfall forecast correction method coupled with Bernoulli-gamma-Gaussian distribution | |
| CN110942196B (en) | Predicted irradiation correction method and device | |
| CN120013004A (en) | A quantitative precipitation estimation method, system, storage medium and electronic device | |
| CN110929849B (en) | Video detection method and device based on neural network model compression | |
| CN115525872B (en) | Two-step Bayesian estimation method for building scale population fused with position data | |
| CN117609756A (en) | A non-uniform hydrological sequence reconstruction method based on regional characteristics | |
| CN113886360B (en) | Data table partitioning method, device, computer readable medium and electronic equipment | |
| CN113868939A (en) | A method, device, equipment and medium for probability density evaluation of wind power | |
| WO2022217568A1 (en) | Daily precipitation forecast correction method coupled with bernoulli-gamma-gaussian distributions | |
| CN117271915B (en) | Space sampling point planning method considering space-time variability |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |