CN120013004A

CN120013004A - A quantitative precipitation estimation method, system, storage medium and electronic device

Info

Publication number: CN120013004A
Application number: CN202510094673.6A
Authority: CN
Inventors: 陈元昭; 陈训来; 刘东华; 王明洁; 张文海; 饶华炎; 张立杰
Original assignee: Shenzhen Meteorological Bureau Shenzhen Meteorological Station
Current assignee: Shenzhen Meteorological Bureau Shenzhen Meteorological Station
Priority date: 2025-01-21
Filing date: 2025-01-21
Publication date: 2025-05-16

Abstract

The invention provides a quantitative precipitation estimation method, a quantitative precipitation estimation system, a storage medium and electronic equipment, wherein the quantitative precipitation estimation method comprises the steps of obtaining a data set, constructing a priori distribution model based on the data set, defining approximate distribution of posterior distribution of precipitation estimation according to the priori distribution model based on variation reasoning, determining lower bound of precipitation estimation according to the approximate distribution based on Jessen inequality, and maximizing the lower bound of precipitation estimation to obtain quantitative precipitation estimation results. According to the method, firstly, a radar data set of a quantitative precipitation estimation area and an automatic station live observation information data set are constructed, a priori distribution model is constructed based on the data set, based on variation reasoning, the approximate distribution of precipitation estimation posterior distribution is defined according to the priori distribution model, the lower bound of precipitation estimation is determined according to the approximate distribution based on the Jessen inequality, the lower bound of precipitation estimation is maximized, and a quantitative precipitation estimation result is obtained, so that quantitative precipitation estimation is carried out, and service application performance of the quantitative precipitation estimation is improved.

Description

Quantitative precipitation estimation method, system, storage medium and electronic equipment

Technical Field

The invention relates to the technical field of weather prediction, in particular to a quantitative precipitation estimation method, a quantitative precipitation estimation system, a storage medium and electronic equipment.

Background

At present, quantitative precipitation estimation (Quantitative Precipitation Estimation, QPE) is the basis of services such as quantitative precipitation prediction QPF, strong precipitation approach warning and the like, is an important component of short-time approach prediction, and is always an important point and difficulty in prediction services. Automatic weather stations are the most direct way to observe precipitation at present. Due to uneven spatial distribution of the automatic weather station, the observed data can not completely reflect the distribution characteristics of precipitation.

Radar reflectivity factors are the primary factors affecting precipitation, but precipitation is affected by a number of factors. How to screen out factors affecting the accuracy of quantitative precipitation estimation to obtain precipitation estimation which is optimized as much as possible is a difficult problem of quantitative precipitation estimation at present.

Thus, a solution is needed.

Disclosure of Invention

The invention aims to provide a quantitative precipitation estimation method, which is based on long-time massive observation data and adopts a quantitative precipitation estimation algorithm based on non-parameter Bayesian machine learning. The algorithm is combined with 9 elements such as radar reflectivity factors, automatic station information, radar inversion wind fields, weather types and the like to constrain a radar quantitative precipitation estimation algorithm, firstly, a radar data set of a quantitative precipitation estimation area and an automatic station live observation information data set are constructed, an priori distribution model is constructed based on the data set, an approximate distribution of precipitation estimation posterior distribution is defined based on a variational reasoning according to the priori distribution model, a lower bound of precipitation estimation is determined based on a Jessen inequality according to the approximate distribution, the lower bound of precipitation estimation is maximized, and a quantitative precipitation estimation result is obtained, so that quantitative precipitation estimation is carried out, and service application performance of the quantitative precipitation estimation is improved.

The quantitative precipitation estimation method provided by the embodiment of the invention comprises the following steps:

acquiring a data set M;

constructing a priori distribution model p (M|D) based on the data set M, wherein D is a family of models;

based on the variational reasoning, defining the approximate distribution q (theta) of precipitation estimation posterior distribution according to a priori distribution model p (M|D), wherein theta is a probability model parameter;

Determining a lower bound E _q[log(p(Θ,M))]-E_q [ log (q (Θ)) ] of precipitation estimation according to the approximate distribution q (Θ) based on the Jessen inequality, wherein E _q is a precipitation estimation data distribution interval;

maximizing the lower bound E _q[log(p(Θ,M))]-E_q [ log (q (Θ)) ] of precipitation estimation to obtain quantitative precipitation estimation results

Optionally, the data set M at least comprises a radar data set of a quantitative precipitation estimation area and an automatic station information data set, wherein the radar data set is 1Km x 1Km radar reflectivity factor grid data in the quantitative precipitation estimation area, and the automatic station information data set comprises longitude and latitude position coordinates and precipitation live observation data of each automatic station in the quantitative precipitation estimation area.

Optionally, the constructing the prior distribution model p (m|d) based on the data set M includes:

g ₀ in the Dirichlet non-parametric bayesian model is a basic probability distribution on the probability measure space Ω, the concentration parameter α ₀ >0, and if the probability distribution G on the probability measure space Ω obeys the basic probability distribution G ₀:

G~DP(α₀,G₀)

Wherein, the basic probability distribution G ₀ determines the distribution of basic constituent elements in the prior distribution model p (M|D), and DP is a Dirichlet process;

Based on the Dirichlet process hybrid model DPM, a generation probability is increased for each data point in a given amount of precipitation estimation area as a priori distribution of data:

m_i～p(m|θ_i)

wherein, the parameter theta _i obeys the probability distribution G, i epsilon N is a set from 1 to the total number N of data points, each data point generates a probability parameter, N is the total number of data points, p is a conditional probability density function, m is the precipitation estimated value of the data point;

the optimal model is chosen as the prior distribution model p (m|d) by comparing the likelihood functions of the different families of prior distribution probability models M _i～p(m|θ_i):

p(M|D)=∫_Θ(M|Θ)p(Θ|D)

wherein D represents a family of models.

Optionally, the selecting the optimal model as the prior distribution model p (m|d) by comparing likelihood functions of different family models includes:

Setting the data of a data set M= { M ₁,m₂,m₃…m_n } to be independent, and reading n automatic station live precipitation observation data in a quantitative precipitation estimation area, wherein the n automatic station live precipitation observation data are arranged randomly to obtain { F (i) }, wherein i=1, 2,..n;

Is provided with Ω=Ω ^t-1, sampling the indicator factor β _i of each automatic station live precipitation observation data i e { F (1), F (2)..f (n) }, n randomly arranged automatic station live precipitation observation data formed based on the function { F (i) }, wherein,Omega ^t-1 is the probability measure space of the time t-1;

Likelihood estimates f _k(m_i) of the observed data based on the current K clusters,

f_k(m_i)＝p(m_i|β_i＝k,M_\i,ζ)

Wherein, beta _i is a new category, M _\i is the data of the corresponding corner mark is removed from the data set M= { M ₁,m₂,m₃…m_n }, ζ is a distribution parameter, k is an observation value, and M _i is a random variable; similar to k, another observation that represents a non-k;

Beta _i is sampled:

in the formula, For the amount of data already in class K, E _i is a preset observation data sample, K is the number of observation data samples, f _k represents the probability density function for class K, delta is the Cronecker delta function when β _i =k, delta (β _i, K) =1, otherwise 0; To represent probability density functions for other than the kth class;

If it is Then the number of clusters K is increased by 1, k=k+1;

Checking the observed data quantity of various clustering calculation likelihood functions, if the total number of the observed data of one type is 0, deleting the corresponding type, and reducing the clustering quantity K by 1, wherein K=K-1.

The quantitative precipitation estimation system provided by the embodiment of the invention comprises:

an acquisition module for acquiring a data set M;

The construction module is used for constructing a priori distribution model p (M|D) based on the data set M, wherein D is a family of models;

the definition module is used for defining the approximate distribution q (theta) of precipitation estimation posterior distribution according to the prior distribution model p (M|D) based on variation reasoning, wherein theta is a probability model parameter;

A determining module for determining a lower bound E _q[log(p(Θ,M))]-E_q [ log (q (Θ)) ] of the precipitation estimate based on the Jessen inequality according to an approximate distribution q (Θ), wherein E _q is a region of the precipitation estimate distribution;

the maximizing module is used for maximizing the lower limit E _q[log(p(Θ,M))]-E_q [ log (q (Θ)) ] of precipitation estimation to obtain quantitative precipitation estimation results

G~DP(α₀,G₀)

m_i～p(m|θ_i)

p(M|D)=∫_Θ(M|Θ)p(Θ|D)

wherein D represents a family of models.

f_k(m_i)＝p(m_i|β_i＝k,M_\i,ζ)

Beta _i is sampled:

If it is Then the number of clusters K is increased by 1, k=k+1;

The embodiment of the invention provides a computer readable storage medium, on which a computer program is stored, and a processor executes the computer program to implement the method of any one of the above embodiments.

The electronic device provided by the embodiment of the invention comprises a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to realize the method of any one of the above.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:

FIG. 1 is a schematic diagram of a quantitative precipitation estimation method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a quantitative precipitation estimation system according to an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.

The embodiment of the invention provides a quantitative precipitation estimation method, as shown in fig. 1, comprising the following steps:

S1, acquiring a data set M;

S2, constructing a priori distribution model p (M|D) based on a data set M, wherein D is a family of models;

S3, based on variation reasoning, defining an approximate distribution q (theta) of precipitation estimation posterior distribution according to a priori distribution model p (M|D), wherein theta is a probability model parameter;

S4, determining a lower boundary E _q[log(p(Θ,M))]-E_q [ log (q (Θ)) ] of precipitation estimation according to the approximate distribution q (Θ) based on the Jessen inequality, wherein E _q is a precipitation estimation data distribution interval;

s5, maximizing a lower boundary E _q[log(p(Θ,M))]-E_q [ log (q (Θ)) ] of precipitation estimation to obtain a quantitative precipitation estimation result

The data set M at least comprises a radar data set of a quantitative precipitation estimation area and an automatic station information data set, wherein the radar data set is 1Km x 1Km radar reflectivity factor grid data in the quantitative precipitation estimation area, and the automatic station information data set comprises longitude and latitude position coordinates and precipitation live observation data of each automatic station in the quantitative precipitation estimation area.

The constructing a priori distribution model p (m|d) based on the data set M, includes:

G~DP(α₀,G₀)

Wherein, the basic probability distribution G ₀ determines the distribution of basic constituent elements in a prior distribution model p (M|D), the DP is a Dirichlet process, wherein, the 'DIRICHLET PROCESS' (Dirichlet process) is a random process widely used in Bayesian non-parametric statistics, and the method is used for modeling the precipitation distribution probability without knowing the specific form of precipitation distribution but having some prior knowledge;

m_i～p(m|θ_i)

Wherein the parameter theta _i obeys a probability distribution G, i epsilon N is a set from 1 to the total number N of data points, each data point generates a probability parameter, N is the total number of the data points, p is a conditional probability density function (probability density function, PDF), m is the precipitation estimated value of the data point, and the expression represents the probability distribution of the precipitation estimated data under the condition of the given parameter;

p(M|D)=∫_Θ(M|Θ)p(Θ|D)

wherein D represents a family of models.

The selecting the optimal model as the prior distribution model p (m|d) by comparing likelihood functions of different family models includes:

Setting the data of a data set M= { M ₁,m₂,m₃…m_n } to be independent, reading n automatic station live precipitation observation data in a quantitative precipitation estimation area, and randomly arranging the n automatic station live precipitation observation data to obtain { F (i) }, wherein i=1, 2..n;

Is provided with Ω=Ω ^t-1, sampling the indicator factor β _i of each automatic station live precipitation observation data i e { F (1), F (2)..f (n) }, n randomly arranged automatic station live precipitation observation data formed based on the function { F (i) }, wherein,The method comprises the steps of collecting parameters at t-1 time, wherein omega ^t-1 is a probability measure space at t-1 time, generating random arrangement based on a specific function for each automatic station live precipitation observation data, sampling to form corresponding indication factors, and the indication factors represent the collecting parameters (such as weather parameters, precipitation values and the like) at the current time, wherein the probability measure space at the time is included.

f_k(m_i)＝p(m_i|β_i＝k,M_\i,ζ)

Wherein β _i is a new class, M _\i is removing data (i.e., k in the formula) corresponding to the corner mark (unique index corresponding to each data point in the data set) from the data set m= { M ₁,m₂,m₃…m_n }, ζ is a distribution parameter, k is an observation value, and M _i is a random variable; Similarly to k, another observation value other than k is represented, and for each current cluster, a likelihood estimate of its observation is calculated. Namely, the fitting degree of the data is evaluated based on the current data set and the clustering model. And gradually fitting different models by setting the number of clusters and sample data, and calculating the likelihood function of each cluster. And (3) calculating related parameters in the formula, such as clustering parameters, observation values, random variables and the like of the data, and finally obtaining the likelihood function value of each cluster. By comparing likelihood functions of different families of models, the model with the highest likelihood value is selected as the optimal model, and is used as the prior distribution model.

Beta _i is sampled:

in the formula, For the amount of data already in class K, E _i is a preset observation data sample, K is the number of observation data samples, f _k represents a probability density function for class K, δ is a Kronecker δ function, when β _i =k, δ (β _i, K) =1, otherwise 0; To represent probability density functions for other than the kth class;

If it is Then the number of clusters K is increased by 1, k=k+1;

Sampling is carried out in the clustering process, and whether the clustering quantity needs to be adjusted is judged. If the amount of data in a cluster is 0, the cluster is deleted and the number of clusters is reduced. The adjustment of the number of clusters can improve the estimation precision, and each cluster can effectively represent the distribution condition of data. When the number of clusters reaches a preset condition, the number of clusters is increased by 1 so as to further refine classification. All clusters are calculated and checked to ensure that the total number of observation data of the likelihood function of each cluster is proper. During data sampling, a kronecker function (Kronecker function) is used to represent the relationship between different data categories. If the number of clusters is still not satisfactory (e.g., a certain class is empty), the class needs to be deleted and the number of clusters reduced. By dynamically adjusting the clustering quantity, the clustering result of the precipitation data can be ensured to be more in line with the distribution of the actual data. After multiple iterations, when the clustering number and likelihood function reach the preset precision, finally outputting the optimal precipitation estimation result.

The working principle and the beneficial effects of the technical scheme are as follows:

Bayesian machine learning is an important branch of machine learning. Bayesian methods were first proposed by the uk math, thomas bayes. Through the development of over two hundred years, the method becomes an important component of machine learning, and is widely applied to the fields of statistical machine learning such as multivariable structured output prediction and the like. The basic concept of bayesian theorem is that given a priori distribution and likelihood function of a model, the posterior probability distribution p (Θ|m) of the model can be derived from bayesian formulas:

in the formula (1), Θ is a probability model parameter, M is a data set, the application comprises a radar data set, an automatic station information data set, p ₀ (Θ) is a priori distribution function of the model, p (M|Θ) is a likelihood function, and p (M) is a constant.

The choice of model is a fundamental problem of bayesian approaches. The Dirichlet non-parameter Bayes model can obtain the number of the clustering centers through the characteristics of the data and the machine automatic learning.

The Dirichlet non-parametric Bayesian model assumes G ₀ is a random probability distribution over a probability measure space Ω, the concentration parameter α ₀ >0, the probability distribution G over space Ω obeys the base probability distribution G ₀:

G~DP(α₀,G₀)(2)

Wherein the basic distribution G ₀ determines the distribution of basic elements in the model.

However, the probability distribution obtained by the Dirichlet process is discrete, and in order to cluster different sets of data with a certain similarity, a Dirichlet process hybrid model DPM (Dirichlet Process Mixture) (antoniak 1974) is introduced, and by adding a generation probability to each data point, the probability is used as the prior distribution of the data:

m_i～p(m|θ_i)(3)

In the formula (3), the parameter theta _i is subjected to G distribution, i epsilon N is a generation probability increased by generating each data point, and N is the data point data. When G obeys the Dirichlet process distribution, the model is called Dirichlet process mixture model. The bayesian method selects the optimal model by comparing likelihood functions of different families of models:

p(M|D)=∫_Θ(M|Θ)p(Θ|D) (4)

in equation (4), D represents a family of models, and over-fitting of the models can be avoided by (4) integration, assuming that p (Θ|D) is uniformly distributed without a significant prior function.

In the study, the data of the observation set M= { M ₁,m₂,m₃…m_n } is assumed to be independent, in order to obtain the indication factor of each observation, in a non-parametric Bayesian model using a Dirichlet process as a priori distribution, likelihood functions of different family models are obtained by using Gibbs sampling to select an optimal model, and the steps are as follows:

(1) Data initialization, namely reading data by a system, and randomly arranging n pieces of observation data to obtain { F (i) }, wherein i=1, 2,..n.

(2) Category clustering, settingΩ=Ω ^t-1, for each observation i e { F (1), F (2)..f (n) }, the indicator factor β _i for each data is sampled.

Likelihood estimates for the observed data are computed based on the existing K clusters:

f_k(m_i)＝p(m_i|β_i＝k,M_\i,ζ) (5)

In formula (5), formula (6), β _i is a new category, M _\i represents the removal of the data of the corresponding corner mark from the corresponding observation dataset, and ζ is the distribution parameter.

Beta _i is sampled:

In the formula (7), in the formula (8), Is the amount of data already in class k. If it isThe number of clusters is increased by one, k=k+1.

(3) Cluster updating, namely checking the quantity of various observed data. If the total number of observed data of a certain class is 0, the class is deleted, and the number of clusters is reduced by one, wherein K=K-1.

The reasoning method of the Bayesian model is important content in Bayesian learning. Given the prior distribution, the posterior distribution of bayesian models is often insoluble, requiring efficient inference methods. In the application, the prediction is performed by adopting the variation reasoning.

In quantitative precipitation estimation prediction, given a data set M, a priori distribution model p (M|D), a variation method is adopted to define an approximate distribution q (Θ) of precipitation estimation posterior distribution. Using the jessen inequality, one can get a lower bound on precipitation estimates:

logp(M)≥E_q[log(p(Θ,M))]-E_q[log(q(Θ))] (9)

By maximizing the estimate lower bound:

the solution of quantitative precipitation estimation can be completed.

To make a quantitative precipitation estimate, it is first necessary to collect some actual data related to precipitation, the data set consisting of a radar data set and an automatic station information data set. The radar dataset contains radar reflectivity factor data within the region of reduced water. The spatial resolution of the radar data is 1km by 1km. These radar reflectivity data can be used to estimate the precipitation distribution, since there is a relationship between the radar signal and the precipitation. The automatic station information data set comprises automatic weather station data in a quantitative precipitation estimation area, and longitude and latitude information of each station and precipitation observation data of corresponding time points are recorded. These live observations serve as references for actual precipitation, serving as verification and auxiliary estimation.

In order to be able to infer an accurate precipitation estimate from the acquired data, a prior distribution model needs to be constructed in order to set prior information of the precipitation based on the existing data and background knowledge. The Dirichlet Process (DP) is a non-parametric bayesian model that can be used to model distributions with an unknown number of populations. In quantitative precipitation estimation, the Dirichlet process is used to model different types of precipitation processes. Through the Dirichlet process, a hybrid model can be constructed, the data divided into classes, and a probability assigned to each class. In the present application, DPM is used to model the a priori distribution of precipitation. In particular, the distribution of data points can be considered as being generated from a mixture model, with the parameters of each mixture component being unknown. The distribution is thus defined by the Dirichlet procedure, providing a priori distribution information for the estimation of the precipitation. In DPM, a priori distribution is used to describe the distribution morphology of precipitation. By constructing prior distribution, the model can be updated when new observation data exists, so that the actual precipitation distribution is gradually approximated.

After the prior distribution model is established, posterior reasoning is carried out. Since in practical cases, the posterior distribution cannot be directly calculated, it is necessary to approximate calculation by a method of variational reasoning. The variational reasoning is an optimization method for solving approximate solutions of complex probability distributions. In precipitation estimation problems, the objective of variational reasoning is to approximate a complex posterior distribution by constructing a simple approximation distribution. In particular, the present application uses variational reasoning to infer the posterior distribution of precipitation from a priori distribution. In the reasoning process, the objective of variational reasoning is to minimize the KL divergence of the approximate posterior distribution by optimizing a set of parameters so as to approximate the true posterior distribution as accurately as possible.

To further optimize the estimation results and to ensure the validity of the model for a given data, the present application introduces the jessen inequality to calculate the lower bound. The Jensen inequality (Jensen's Inequality) is a mathematical tool in bayesian inference that provides a method of approximating an objective function by optimizing a lower bound. In the present application, the jessen inequality is used to establish the lower bound for precipitation estimation. By calculating the lower bound it is ensured that the estimate of the posterior distribution does not deviate too far from the true value, and the maximization of this lower bound also allows to find the optimal precipitation estimate in the multiple solution space.

After the lower bound is calculated, the process of the lower bound is maximized. By maximizing the lower bound, a quantitative precipitation estimate can be obtained that is as accurate as possible. Specifically, for a given dataset, an objective function is derived based on the variational reasoning and the jessen inequality, which function represents the magnitude of the lower bound. And optimizing the objective function to maximize the lower bound, and finally obtaining the optimal precipitation estimation result. It is often necessary to estimate likelihood functions for the data points and to make precipitation estimates by selecting an optimal model. The likelihood function describes the probability of occurrence of the observed data given the model parameters. In precipitation estimation, likelihood functions are used to calculate the difference between the precipitation data and the model predictions. By maximizing the likelihood function, precipitation can be estimated more accurately. In actual data, a plurality of different precipitation types often appear, so that the data are grouped by adopting a clustering method, the distribution characteristics of the precipitation of different types can be better understood, and more accurate estimated values can be obtained through sampling calculation.

Through the steps, the quantitative precipitation estimation result can be finally obtained. The whole process combines the methods of Bayesian reasoning, variational reasoning, jessen inequality and the like, and ensures the accuracy and reliability of the estimation result. The model is continuously optimized and approximates the real precipitation amount by means of a priori distribution model, variation reasoning, lower bound maximization and the like, and finally high-precision precipitation estimation is achieved. The application has wide application prospect in the meteorological field, in particular to the aspects of extreme weather early warning, climate research and the like.

The quantitative precipitation estimation method and system of the invention adopt a quantitative precipitation estimation algorithm based on non-parameter Bayesian machine learning based on long-time massive observation data. The algorithm is combined with 9 elements such as radar reflectivity factors, automatic station information, radar inversion wind fields, weather types and the like to constrain a radar quantitative precipitation estimation algorithm, firstly, a radar data set of a quantitative precipitation estimation area and an automatic station live observation information data set are constructed, an priori distribution model is constructed based on the data set, an approximate distribution of precipitation estimation posterior distribution is defined based on a variational reasoning according to the priori distribution model, a lower bound of precipitation estimation is determined based on a Jessen inequality according to the approximate distribution, the lower bound of precipitation estimation is maximized, and a quantitative precipitation estimation result is obtained, so that quantitative precipitation estimation is carried out, and service application performance of the quantitative precipitation estimation is improved.

The quantitative precipitation estimation system provided by the embodiment of the invention, as shown in fig. 2, comprises:

An acquisition module 1 for acquiring a data set M;

the construction module 2 is used for constructing a priori distribution model p (M|D) based on the data set M, wherein D is a family of models;

The definition module 3 is used for defining the approximate distribution q (theta) of precipitation estimation posterior distribution according to the prior distribution model p (M|D) based on variation reasoning, wherein theta is a probability model parameter;

A determining module 4, configured to determine a lower bound E _q[log(p(Θ,M))]-E_q [ log (q (Θ)) ] of the precipitation estimate according to the approximate distribution q (Θ) based on the Jessen inequality, where E _q is a region of the precipitation estimate distribution;

the maximizing module 5 is used for maximizing the lower boundary E _q[log(p(Θ,M))]-E_q [ log (q (Θ)) ] of precipitation estimation to obtain quantitative precipitation estimation results

G~DP(α₀,G₀)

m_i～p(m|θ_i)

p(M|D)=∫_Θ(M|Θ)p(Θ|D)

wherein D represents a family of models.

f_k(m_i)＝p(m_i|β_i＝k,M_\i,ζ)

Beta _i is sampled:

If it is Then the number of clusters K is increased by 1, k=k+1;

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A quantitative precipitation estimation method, characterized by comprising:

Get data set M;

Based on the data set M, construct a prior distribution model p(M|D); where D is a family of models;

Based on variational inference, according to the prior distribution model p(M|D), the approximate distribution q(Θ) of the posterior distribution of precipitation estimation is defined; where Θ is the probability model parameter;

Based on Jason's inequality and the approximate distribution q(Θ), the lower bound of precipitation estimation is determined as E _q [log(p(Θ,M))]-E _q [log(q(Θ))]; where E _q is the distribution interval of precipitation estimation data;

Maximize the lower bound of precipitation estimation _Eq [log(p(Θ,M))]- _Eq [log(q(Θ))] to obtain quantitative precipitation estimation results

2. The quantitative precipitation estimation method as described in claim 1 is characterized in that the data set M at least includes: a radar data set of the quantitative precipitation estimation area and an automatic station information data set, wherein the radar data set is 1Km*1Km radar reflectivity factor grid data in the quantitative precipitation estimation area, and the automatic station information data set contains the longitude and latitude position coordinates and actual precipitation observation data of each automatic station in the quantitative precipitation estimation area.

3. The quantitative precipitation estimation method according to claim 1, wherein the step of constructing a prior distribution model p(M|D) based on the data set M comprises:

In the Dirichlet nonparametric Bayesian model, G ₀ is the basic probability distribution on the probability measure space Ω, and the concentration parameter α ₀ >0. If the probability distribution G on the probability measure space Ω obeys the basic probability distribution G ₀ , then:

G～DP(α ₀ ,G ₀ )

Where: the basic probability distribution G ₀ determines the distribution of the basic components in the prior distribution model p(M|D); DP is the Dirichlet process;

Based on the Dirichlet process mixture model DPM, a generation probability is added to each data point in the given precipitation estimation area as the prior distribution of the data:

m _i ～p(m|θ _i )

Where: parameter _θi follows the probability distribution G, i∈[N] is a set with values ranging from 1 to the total number of data points N, each data point generates a probability parameter, N is the total number of data points; p is the conditional probability density function; m is the precipitation estimate of the data point;

By comparing the likelihood functions of different families of prior distribution probability models m _i ~p(m|θ _i ), the optimal model is selected as the prior distribution model p(M|D):

p(M|D)＝∫ _Θ (M|Θ)p(Θ|D)

Where: D represents a family of models.

4. The quantitative precipitation estimation method according to claim 1, wherein the selecting the optimal model as the prior distribution model p(M|D) by comparing the likelihood functions of different families of models comprises:

Assume that the data set M = {m ₁ ,m ₂ ,m ₃ …m _n } is independent, read the actual precipitation observation data of n automatic stations in the quantitative precipitation estimation area, and randomly arrange the actual precipitation observation data of n automatic stations to obtain {F(i)}, where i = 1, 2, …n;

set up Ω＝Ω ^t-1 , sample the indicator factor β i of the actual precipitation observation data of each automatic station i∈{F(1),F(2)…F(n)} and the n random permutations of the actual precipitation observation data of the automatic station formed based on the function {F( _i )}; wherein, is the concentrated parameter at time t-1;Ω ^t-1 is the probability measure space at time t-1;

Calculate the likelihood estimate f _k (m _i ) of the observed data based on the current K clusters,

f _k (m _i )＝p(m _i |β _i =k,M _\i ,ζ)

In the formula, β _i is a new category, M _\i is the data with the corresponding subscript removed from the data set M = {m ₁ ,m ₂ ,m ₃ …m _n }; ζ is the distribution parameter; k is the observation value; _mi is a random variable; Similar to k, it represents another observation value other than k;

Sampling β _i :

In the formula, is the amount of data already in the kth class; E _i is the preset observed data sample; K is the number of observed data samples; f _k represents the probability density function for the kth class; δ is the Kronecker delta function, when β _i = k, δ(βi, k) = 1, otherwise it is 0; To represent the probability density function except for the kth class;

if Then increase the number of clusters K by 1, K = K + 1;

Check the amount of observation data for calculating the likelihood function of each type of clustering. If the total number of observation data for one type is 0, delete the corresponding type and the number of clusters K will be reduced by 1, K=K-1.

5. A quantitative precipitation estimation system, characterized by comprising:

An acquisition module, used to acquire a data set M;

A construction module is used to construct a prior distribution model p(M|D) based on a data set M, where D is a family of models;

A definition module is used to define the approximate distribution q(Θ) of the posterior distribution of precipitation estimation based on the prior distribution model p(M|D) based on variational inference; wherein Θ is a probability model parameter;

A determination module, used for determining the lower bound of precipitation estimation E _q [log(p(Θ,M))]-E _q [log(q(Θ))] based on Jason's inequality and the approximate distribution q(Θ); wherein E _q is the interval of the precipitation estimation distribution;

The maximization module is used to maximize the lower bound of precipitation estimation _Eq [log(p(Θ,M))]- _Eq [log(q(Θ))] to obtain quantitative precipitation estimation results.

6. The quantitative precipitation estimation system as described in claim 5 is characterized in that the data set M at least includes: a radar data set of the quantitative precipitation estimation area and an automatic station information data set, wherein the radar data set is 1Km*1Km radar reflectivity factor grid data in the quantitative precipitation estimation area, and the automatic station information data set contains the latitude and longitude position coordinates and actual precipitation observation data of each automatic station in the quantitative precipitation estimation area.

7. The quantitative precipitation estimation system according to claim 5, wherein the constructing of the prior distribution model p(M|D) based on the data set M comprises:

G～DP(α ₀ ,G ₀ )

m _i ～p(m|θ _i )

p(M|D)＝∫ _Θ (M|Θ)p(Θ|D)

Where: D represents a family of models.

8. The quantitative precipitation estimation system according to claim 5, wherein the selecting the optimal model as the prior distribution model p(M|D) by comparing the likelihood functions of different families of models comprises:

f _k (m _i )＝p(m _i |β _i =k,M _\i ,ζ)

Sampling β _i :

In the formula, is the amount of data already in the kth class; E _i is the preset observation data sample; K is the number of observation data samples; f _k represents the probability density function for the kth class; δ is the Kronecker delta function, when β _i = k, δ(β _i , k) = 1, otherwise it is 0; To represent the probability density function except for the kth class;

if Then increase the number of clusters K by 1, K = K + 1;

9. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and a processor executes the computer program to implement the method according to any one of claims 1 to 4.

10. An electronic device, characterized in that the electronic device comprises a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to implement the method according to any one of claims 1 to 4.