CN115374570A - Multi-source weighted training set construction method for deformation prediction of engineering tunnel crossing - Google Patents
Multi-source weighted training set construction method for deformation prediction of engineering tunnel crossing Download PDFInfo
- Publication number
- CN115374570A CN115374570A CN202211229156.8A CN202211229156A CN115374570A CN 115374570 A CN115374570 A CN 115374570A CN 202211229156 A CN202211229156 A CN 202211229156A CN 115374570 A CN115374570 A CN 115374570A
- Authority
- CN
- China
- Prior art keywords
- data
- training set
- weight
- source
- weights
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/10—Geometric CAD
- G06F30/17—Mechanical parametric or variational design
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/10—Geometric CAD
- G06F30/13—Architectural design, e.g. computer-aided architectural design [CAAD] related to design of buildings, bridges, landscapes, production plants or roads
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Geometry (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Mathematical Analysis (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Optimization (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Computational Mathematics (AREA)
- Civil Engineering (AREA)
- Structural Engineering (AREA)
- Architecture (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本发明公开了一种用于穿越工程隧道变形预测的多源加权训练集构建方法,包括:步骤S1、获取穿越工程的设计数据及对应的隧道变形;步骤S2、对设计数据进行处理获得初始输入集DatasetV1;步骤S3、根据设计数据的来源确定信息源权重;步骤S4、根据数据关键特征的取值确定不同数据的适应性权重;步骤S5、将步骤S3及S4的权重进行组合并作为输入集的特征;步骤S6、以步骤S5中输入集作为输入训练集,以隧道变形值作为输出训练集,组成训练集。本发明通过对多源的工程信息进行结构化、权重化处理,得到可反应穿越工程特点并有利于机器学习算法使用的训练集,有效支撑了隧道变形的智能化预测,为相关工程评估提供量化依据。
The invention discloses a method for constructing a multi-source weighted training set for deformation prediction of a crossing project tunnel, comprising: step S1, acquiring design data of the crossing project and corresponding tunnel deformation; step S2, processing the design data to obtain initial input Set DatasetV1; step S3, determine the weight of the information source according to the source of the design data; step S4, determine the adaptive weight of different data according to the value of the key feature of the data; step S5, combine the weights of steps S3 and S4 and use it as an input set features; step S6, using the input set in step S5 as an input training set, and using the tunnel deformation value as an output training set to form a training set. In the present invention, through structured and weighted processing of multi-source engineering information, a training set that can reflect the characteristics of the traversing engineering and is beneficial to the use of machine learning algorithms is obtained, effectively supporting the intelligent prediction of tunnel deformation, and providing quantification for related engineering evaluation in accordance with.
Description
技术领域technical field
本发明属于隧道穿越工程技术领域,具体涉及一种用于穿越工程隧道变形预测的多源加权训练集构建方法。The invention belongs to the technical field of tunnel crossing engineering, and in particular relates to a method for constructing a multi-source weighted training set for deformation prediction of tunnel crossing engineering.
背景技术Background technique
随着计算机处理能力与智能算法的不断发展,机器学习已逐渐成为工业界解决实际问题的有效手段。机器学习方法中的监督学习可以根据现有案例,直接建立输入与输出之间的潜在联系,从而避免了复杂的中间链路,可简单、快速、准确地预测所需量,是各领域实现智能化、自动化的通用工具。而这种监督学习的模式高度依赖于学习算法使用的训练集,运用结构化且特征清晰的训练集能帮助学习算法更高效地达成学习目标。With the continuous development of computer processing power and intelligent algorithms, machine learning has gradually become an effective means for solving practical problems in the industry. Supervised learning in machine learning methods can directly establish the potential connection between input and output based on existing cases, thus avoiding complex intermediate links, and can predict the required quantity simply, quickly and accurately, which is the realization of intelligence in various fields. General tools for automation and automation. This supervised learning model is highly dependent on the training set used by the learning algorithm. Using a structured and clearly characterized training set can help the learning algorithm achieve the learning goal more efficiently.
对于穿越工程的隧道变形,大量的工程实践与学者研究积累了诸多有价值的数据,为机器学习解决变形预测问题提供了优良的条件。但这些数据来源多样,包含现场实测、数值模拟、理论分析以及模型试验等多种研究手段得出的特征不一的数据,同时还存在诸如设计图、云图等诸多非结构化的数据,这些数据均难以被相关学习算法直接利用。另一方面,多样信息来源的可信程度存在差异,同时变形预测与工程本身的属性高度关联,案例的贴合程度对预测有重要影响,因此机器学习算法在该领域运用时,常不能简单地认为每个学习案例都具有同样的价值,而应根据需求与先验性的认知对训练集进行调整。For tunnel deformation through engineering, a large number of engineering practices and scholars' research have accumulated a lot of valuable data, which provides excellent conditions for machine learning to solve deformation prediction problems. However, these data sources are diverse, including data with different characteristics obtained from various research methods such as field measurements, numerical simulations, theoretical analysis, and model tests. At the same time, there are many unstructured data such as design drawings and cloud images. These data are difficult to be directly utilized by relevant learning algorithms. On the other hand, there are differences in the credibility of various information sources. At the same time, the deformation prediction is highly related to the attributes of the project itself, and the degree of fit of the case has an important impact on the prediction. Each learning case is considered to have the same value, and the training set should be adjusted according to needs and prior knowledge.
因此,需要一种能从既有数据资料出发,适配相关学习算法使用需求的训练集生成方法,以解决现有技术中存在的缺陷。Therefore, there is a need for a training set generation method that can start from existing data and adapt to the requirements of the use of relevant learning algorithms, so as to solve the defects in the prior art.
发明内容Contents of the invention
为了解决上述技术问题,本发明提供了一种用于穿越工程隧道变形预测的多源加权训练集构建方法,旨在优化修改设计资料中的特征内容,以特征工程的角度对多源数据进行重塑,并针对穿越工程特点对重塑数据进行权重化处理,得到可直接运用于机器学习算法的训练集,为高效预测隧道变形提供有力支持。另外,还提供了一种基于机器学习算法的穿越工程隧道变形预测方法以及系统。In order to solve the above technical problems, the present invention provides a method for constructing a multi-source weighted training set for deformation prediction of engineering tunnels. According to the characteristics of crossing engineering, the reshaping data is weighted to obtain a training set that can be directly applied to machine learning algorithms, providing strong support for efficient prediction of tunnel deformation. In addition, a method and system for predicting the deformation of a crossing engineering tunnel based on a machine learning algorithm are also provided.
为了达到上述目的,本发明采用的技术方案包括:In order to achieve the above object, the technical scheme adopted in the present invention comprises:
一种用于穿越工程隧道变形预测的多源加权训练集构建方法,其特征在于,包括以下步骤:A method for constructing a multi-source weighted training set for deformation prediction of an engineering tunnel, characterized in that it comprises the following steps:
步骤S1、获取穿越工程的设计数据以及对应的隧道变形,其中设计数据为训练集的输入部分,包含工程地质信息、隧道结构信息、施工信息,隧道变形为训练集的输出部分;Step S1, obtaining the design data of the crossing project and the corresponding tunnel deformation, wherein the design data is the input part of the training set, including engineering geological information, tunnel structure information, and construction information, and the tunnel deformation is the output part of the training set;
步骤S2、根据步骤S1中设计数据的特征将其分为数值型的数据和非数值型的数据,将数值型的数据组成数据集ASet,对非数值型的数据进行处理得到数据集Bset,组合Aset与Bset,得到初始输入集DatasetV1;Step S2, divide it into numerical data and non-numeric data according to the characteristics of the design data in step S1, form the data set ASet with the numerical data, process the non-numeric data to obtain the data set Bset, combine Aset and Bset, get the initial input set DatasetV1;
步骤S3、根据初始输入集DatasetV1中设计数据的来源,采用一定衡量策略确定信息源权重;Step S3, according to the source of the design data in the initial input set DatasetV1, adopt a certain measurement strategy to determine the weight of the information source;
步骤S4、根据初始输入集DatasetV1中设计数据关键特征的取值情况,确定不同数据的适应性权重;Step S4, according to the values of the key features of the design data in the initial input set DatasetV1, determine the adaptability weights of different data;
步骤S5、将步骤S3及S4中的权重进行组合并将其作为输入集的特征,得到加权输入集DatasetV2;Step S5, combining the weights in steps S3 and S4 and using it as the feature of the input set to obtain the weighted input set DatasetV2;
步骤S6、以步骤S5中加权输入集DatasetV2作为输入训练集,以隧道变形数值作为输出训练集,组成标准训练集TrainDataset。Step S6, using the weighted input dataset DatasetV2 in step S5 as the input training set, and using the tunnel deformation value as the output training set to form a standard training set TrainDataset.
根据本发明的实施方案,步骤S2中,对非数值型的数据的处理包括采用One-Hot方法对非数值型的数据进行编码,编码过程包括以下步骤:According to an embodiment of the present invention, in step S2, the processing of the non-numeric data comprises adopting the One-Hot method to encode the non-numeric data, and the encoding process includes the following steps:
步骤S2.1、筛选特征为非数值型的数据并组成数据集FeatureSet,数据集FeatureSet中的特征总数记为m,每个特征对应的数据集记为FeatureSeti(i=1,2,……,m);Step S2.1, filter data with non-numeric features and form a dataset FeatureSet. The total number of features in the dataset FeatureSet is denoted as m, and the dataset corresponding to each feature is denoted as FeatureSet i (i=1, 2, ... , m);
步骤S2.3、针对每个特征数据集FeatureSeti,划分出ni个状态特征,并根据每个案例的取值,得到在状态特征表示下的数据集ConditionSeti,其中ni为该特征下所有离散取值的个数;以及Step S2.3. For each feature data set FeatureSet i , divide n i state features, and according to the value of each case, obtain the data set ConditionSet i under the state feature representation, where n i is the state feature under the feature the number of all discrete values; and
步骤S2.4、组合m个状态特征表示下的数据集ConditionSeti,得到非数值型数据状态特征表示数据集Bset。Step S2.4, combining m data sets ConditionSet i represented by state features to obtain a non-numeric data state feature representation data set Bset.
根据本发明的实施方案,所述步骤S3中的设计数据来源包括理论分析、数值模拟、模型试验、现场监测。According to an embodiment of the present invention, the sources of design data in the step S3 include theoretical analysis, numerical simulation, model test, and field monitoring.
根据本发明的实施方案,所述步骤S3中信息源权重为衡量不同数据来源重要度的指标,具体的衡量策略可以包括:According to an embodiment of the present invention, the weight of the information source in the step S3 is an indicator for measuring the importance of different data sources, and the specific measurement strategy may include:
(1)专家打分法,根据专家评价直接确定每一种数据来源的得分,使同一种来源的数据具有相同的权重;(1) Expert scoring method, directly determine the score of each data source according to expert evaluation, so that data from the same source have the same weight;
(2)区间随机生成法,规定每一种数据来源的权重区间,对每个案例根据其来源在权重区间内随机生成权重;(2) Interval random generation method, which stipulates the weight interval of each data source, and randomly generates weight within the weight interval according to its source for each case;
(3)排序比较法,对数据来源的重要程度进行排序,从数据集中不放回抽取多个案例并按重要程度排列,根据名次高低分别予以由高到低的权重,直至所有案例被抽取;以及(3) Sorting and comparison method, sorting the importance of data sources, extracting multiple cases from the data set without replacement and sorting them according to the importance, and assigning weights from high to low according to the ranking, until all cases are extracted; as well as
(4)综合法,采用两种或两种以上的方法分别得到的权重,综合得到最终权重。(4) The comprehensive method, using the weights obtained by two or more methods respectively, to obtain the final weight by synthesis.
根据本发明的实施方案,所述步骤S4中适应性权重为衡量数据与典型工况的接近程度,包括以下步骤:According to an embodiment of the present invention, the adaptive weight in the step S4 is to measure the closeness of the data to the typical working conditions, including the following steps:
步骤S4.1、确定典型工况的重要判定特征值yi(i=1,2,……,n),其中n为重要判定特征(也即关键特征)的数量;Step S4.1, determine the important judgment feature value y i (i=1, 2, ..., n) of typical working conditions, where n is the number of important judgment features (ie key features);
步骤S4.2、计算重要判定特征中离散型特征的雅卡尔距离D1,公式为:Step S4.2. Calculate the Jacquard distance D 1 of the discrete features among the important decision features, the formula is:
上式中,x与y分别为案例与典型工况的离散型特征取值的集合;In the above formula, x and y are the sets of discrete feature values of the case and typical working conditions respectively;
步骤S4.3、计算重要判定特征中连续型特征的相对欧式距离D2,公式为:Step S4.3. Calculating the relative Euclidean distance D 2 of the continuous features among the important decision features, the formula is:
上式中,xi与yi分别为案例与典型工况在第i项连续型特征的取值,l为重要判定特征中连续型特征的个数;以及In the above formula, x i and y i are the values of the continuous features of item i in the case and typical working conditions respectively, and l is the number of continuous features in the important judgment features; and
步骤S4.4、结合距离D1与D2,计算每个案例的适应性权重,公式为:Step S4.4, combine the distances D 1 and D 2 to calculate the adaptive weight of each case, the formula is:
ωfit=f(D1)+g(D2)ω fit =f(D 1 )+g(D 2 )
式中,f(x)与g(x)均为距离转换函数,可选用反比例系数为正数的反比例函数、斜率为负数截距为正数的一次函数或其他值域为正数的单调递减函数。In the formula, both f(x) and g(x) are distance conversion functions, and the inverse proportional function whose inverse proportional coefficient is a positive number, the linear function whose slope is a negative number and the intercept is a positive number, or other monotonically decreasing values with a positive value range function.
根据本发明的实施方案,所述步骤S5的权重组合为根据每个案例的信息源权重与适应性权重,通过放缩、分配等组合策略得到案例的综合权重,具体的组合策略包括:According to an embodiment of the present invention, the weight combination in step S5 is based on the information source weight and adaptive weight of each case, and the comprehensive weight of the case is obtained through combination strategies such as scaling and distribution. The specific combination strategy includes:
(1)累加法,规定分项权重占组合权重的比例,将按比例累加的结果作为组合权重;(1) Accumulation method, which stipulates the proportion of sub-item weights to combined weights, and the result of cumulative addition in proportion is taken as combined weights;
(2)累乘法,将所有分项权重相乘,以乘积的结果作为组合权重;(2) Cumulative multiplication, multiplying all sub-item weights, and taking the result of the product as the combined weight;
(3)取小法,选择所有分项权重中的最小值作为组合权重;(3) Take the small method, select the minimum value in all sub-item weights as the combined weight;
(4)取大法,选择所有分项权重中的最大值作为组合权重;以及(4) Take the big method, select the maximum value in all sub-item weights as the combined weight; and
(5)随机法,以所有分项权重中的最小值作为区间下界,最大值作为区间上界,在区间内随机生成权重作为组合权重。(5) Random method, the minimum value of all sub-item weights is used as the lower boundary of the interval, the maximum value is used as the upper boundary of the interval, and the weights are randomly generated within the interval as the combination weight.
另外,本发明还提供了基于机器学习算法的穿越工程隧道变形预测方法以及系统。In addition, the present invention also provides a deformation prediction method and system for crossing engineering tunnels based on machine learning algorithms.
与现有技术相比,本发明具有以下优点:Compared with the prior art, the present invention has the following advantages:
(1)本发明从已有易得的工程资料出发,通过数字化的编码的转化手段,使多源数据具有统一的结构化格式,并对非数值型特征进行One-hot处理,避免了非数值数据的可加性损失问题,大大增强了训练集的可用程度,从而支撑了相关机器学习算法的使用,为快速预测提供了基础。(1) The present invention starts from the existing engineering data that is easy to obtain, and through digital coding transformation means, the multi-source data has a unified structured format, and performs One-hot processing on non-numerical features, avoiding non-numerical features. The data additivity loss problem greatly enhances the availability of the training set, thus supporting the use of related machine learning algorithms and providing a basis for rapid prediction.
(2)本发明针对穿越工程特点,在满足学习算法使用要求的前提下对训练集中样本进行加权处理,从而有针对性地在数据层面融入了穿越工程的关键特性,以此为相关学习算法提供了大量关键的学习特征。(2) According to the characteristics of the traversing project, the present invention weights the samples in the training set under the premise of meeting the requirements of the learning algorithm, so as to incorporate the key characteristics of the traversing project at the data level in a targeted manner, so as to provide relevant learning algorithms A large number of key learning features.
附图说明Description of drawings
图1为根据本发明实施方案的用于穿越工程隧道变形预测的多源加权训练集构建方法的流程示意图;以及Fig. 1 is a schematic flow chart of a method for constructing a multi-source weighted training set for deformation prediction of a tunnel through an engineering tunnel according to an embodiment of the present invention; and
图2为根据本发明实施方案的用于穿越工程隧道变形预测的多源加权训练集构建方法中的初始输入集DatasetV1生成过程示意图。Fig. 2 is a schematic diagram of the generation process of the initial input set DatasetV1 in the method for constructing the multi-source weighted training set for deformation prediction of tunnel crossing according to the embodiment of the present invention.
具体实施方式Detailed ways
下面结合附图、通过具体实施例对本发明进一步详述,所示内容用于充分阐述本发明的内容,而并不用于限制本发明。The present invention will be further described in detail through specific embodiments below in conjunction with the accompanying drawings, and the shown content is used to fully illustrate the content of the present invention, but not to limit the present invention.
以预测穿越施工情况下既有隧道的竖向变形为例,详细说明本发明用于穿越工程隧道变形预测的多源加权训练集构建方法。如附图1所示,本发明的方案可以包括以下步骤:Taking the prediction of the vertical deformation of an existing tunnel in the case of crossing construction as an example, the method for constructing a multi-source weighted training set for deformation prediction of a crossing engineering tunnel according to the present invention is described in detail. As shown in accompanying drawing 1, the scheme of the present invention may comprise the following steps:
步骤S1、获取穿越工程的设计数据以及对应的隧道变形,其中设计数据为训练集的输入部分,可以包含工程地质信息、隧道结构信息、施工信息等,隧道变形为训练集的输出部分:Step S1. Obtain the design data of the crossing project and the corresponding tunnel deformation. The design data is the input part of the training set, which can include engineering geological information, tunnel structure information, construction information, etc., and the tunnel deformation is the output part of the training set:
对于本实施例,收集了工程设计书、公开发表的学术文献、相关监测报告等多渠道数据,整理筛选后得到了共计187个设计数据案例及对应的隧道竖向变形值。For this embodiment, multi-channel data such as engineering design documents, published academic documents, and related monitoring reports were collected, and a total of 187 design data cases and corresponding tunnel vertical deformation values were obtained after sorting and screening.
步骤S2、根据步骤S1中设计数据的特征将其分为数值型的数据和非数值型的数据,将数值型的数据组成数据集ASet,对非数值型的数据进行处理得到数据集Bset,组合Aset与Bset,得到初始输入集DatasetV1(参见附图2):Step S2, divide it into numerical data and non-numeric data according to the characteristics of the design data in step S1, form the data set ASet with the numerical data, process the non-numeric data to obtain the data set Bset, combine Aset and Bset, get the initial input set DatasetV1 (see Figure 2):
根据案例信息进行归纳,整理了共同或可转化的设计数据特征,具体包括工程地质特征30项、隧道结构特征10项、施工信息5项共计45个特征,将上述数据分为数值型和非数值型数据:According to the case information, the common or transformable design data characteristics are sorted out, specifically including 30 engineering geological characteristics, 10 tunnel structure characteristics, and 5 construction information, a total of 45 characteristics, and the above data are divided into numerical and non-numerical type data:
数值型的数据组成数据集ASet的步骤包括筛选特征为数值型的数据并组成数据集ASet。本实施例数据中有42个特征为数值型数据,共同组成数据集ASet。The step of forming the data set ASet from numerical data includes filtering the data whose characteristic is numerical and forming the data set ASet. In the data of this embodiment, 42 features are numerical data, which together form the data set ASet.
采用One-Hot方法对非数值型的数据进行编码,编码过程包括:Use the One-Hot method to encode non-numeric data. The encoding process includes:
步骤S2.1、筛选特征为非数值型的数据并组成数据集FeatureSet,数据集FeatureSet中的特征总数记为m,每个特征对应的数据集记为FeatureSeti(i=1,2,……,m)。本实施例的数据中,共有“衬砌类型”、“工法”、“控制措施”3个特征为非数值型数据,由此组成特征总数m=3的FeatureSet数据集,三个特征的数据集分别对应FeatureSet1、FeatureSet2、FeatureSet3,如表1所示。Step S2.1, filter data with non-numeric features and form a dataset FeatureSet. The total number of features in the dataset FeatureSet is denoted as m, and the dataset corresponding to each feature is denoted as FeatureSet i (i=1, 2, ... , m). In the data of this embodiment, there are 3 features of "lining type", "construction method" and "control measures" which are non-numeric data, thus forming a FeatureSet data set with the total number of features m=3, and the data sets of the three features are respectively Corresponding to FeatureSet 1 , FeatureSet 2 , and FeatureSet 3 , as shown in Table 1.
表1:FeartureSet数据集Table 1: FeartureSet dataset
步骤S2.2、针对每个特征数据集FeatureSeti,划分出ni个状态特征,并根据每个案例的取值,得到在状态特征表示下的数据集ConditionSeti,其中ni为该特征下所有离散取值的个数。本实施例中,以“工法”组成的FeatureSet2为例,187个案例中的特征取值包含“盾构法”、“TBM法”、“矿山法”、“明挖法”四种离散取值,据此划分出n2=4个状态特征并分别对应原离散取值,案例属于该离散取值时特征值为1,不属于时特征值为0,从而生成状态特征表示下的数据集ConditionSet2,具体如表2所示。Step S2.2. For each feature data set FeatureSet i , divide n i state features, and according to the value of each case, obtain the data set ConditionSet i represented by the state feature, where n i is the state feature under the feature The number of all discrete values. In this embodiment, taking FeatureSet 2 composed of "construction methods" as an example, the feature values in 187 cases include four discrete values of "shield tunneling method", "TBM method", "mine method" and "open cut method". Value, based on which n 2 = 4 state features are divided and correspond to the original discrete values respectively. When the case belongs to the discrete value, the feature value is 1, and when it does not belong to the feature value, the feature value is 0, thus generating the data set under the state feature representation ConditionSet 2 , as shown in Table 2 for details.
表2:ConditionSet2数据集Table 2: ConditionSet 2 dataset
步骤S2.3、组合m个状态特征表示下的数据集ConditionSeti,得到非数值型数据状态特征表示数据集Bset。Step S2.3, combining m data sets ConditionSet i represented by state features to obtain a non-numeric data state feature representation data set Bset.
之后对初始输入集DatasetV1进行进一步的处理。After that, the initial input set DatasetV1 is further processed.
步骤S3、根据设计数据的来源,采用一定衡量策略确定信息源权重;信息源权重为衡量不同数据来源重要度的指标,可以采用专家打分法、区间随机生成法、排序比较法以及综合法等来确定。Step S3, according to the source of the design data, use a certain measurement strategy to determine the weight of the information source; the weight of the information source is an index to measure the importance of different data sources, and can be determined by expert scoring method, interval random generation method, sorting comparison method and comprehensive method, etc. Sure.
设计数据来源可以包括理论分析、数值模拟、模型试验以及现场监测等。对于本实施例,经过统计,共有基于理论分析的案例15个、基于数值模拟的案例136个、基于模型试验的案例7个、基于现场监测的案例29个。本实施例根据专家打分法,权衡后得到理论分析、数值模拟、模型试验、现场检测的重要度权重分别为0.07、0.24、0.12、0.57。Sources of design data may include theoretical analysis, numerical simulation, model tests, and field monitoring. For this embodiment, after statistics, there are 15 cases based on theoretical analysis, 136 cases based on numerical simulation, 7 cases based on model test, and 29 cases based on field monitoring. In this embodiment, according to the expert scoring method, the importance weights of theoretical analysis, numerical simulation, model test, and on-site inspection are respectively 0.07, 0.24, 0.12, and 0.57 after weighing.
步骤S4、根据设计数据关键特征的取值情况,确定不同数据的适应性权重,适应性权重为衡量数据与典型工况的接近程度:Step S4, according to the values of the key features of the design data, determine the adaptability weight of different data, the adaptability weight is to measure the closeness of the data to the typical working conditions:
本实施例中以软土地区盾构隧道作为典型案例,所构造的训练集以预测软土地区沉降为主要目的,同时参考其他工况的潜在影响规律,由此按以下具体步骤计算适应性权重:In this embodiment, shield tunnels in soft soil areas are taken as a typical case. The training set constructed is mainly aimed at predicting settlement in soft soil areas. At the same time, referring to the potential influence laws of other working conditions, the adaptive weight is calculated according to the following specific steps :
步骤S4.1、确定典型工况的重要判定特征值yi(i=1,2,……,n),其中n为重要判定特征(关键特征)的数量。本实施例以“土层模量”、“工法”两个特征作为关键特征,以3000kPa、“盾构法”为关键特征取值,值得说明的是“工法”特征已在步骤S2中进行了转换,此处以转换后的状态特征作为关键特征。其中,典型工况为穿越工程中具有代表性的一种或多种案例,其在部分特征上存在一个或多个重要判定特征值,该值可反应该类工况的特点;关键特征是指区分不同类型案例种类的一个或多个特征,可以以预测目标的种类作为选取依据,以行业内共识的工程经验、分类度量作为关键特征取值,这为本领域普通技术人员所熟知;Step S4.1. Determine important judgment feature values y i (i=1, 2, . . . , n) of typical operating conditions, where n is the number of important judgment features (key features). In this embodiment, the two features of "soil layer modulus" and "construction method" are used as key features, and 3000kPa and "shield tunneling method" are used as key feature values. It is worth noting that the feature of "construction method" has been carried out in step S2. Transformation, where the transformed state characteristics are used as key characteristics. Among them, a typical working condition refers to one or more representative cases in the crossing project, and there are one or more important judgment characteristic values in some characteristics, which can reflect the characteristics of this type of working condition; key features refer to To distinguish one or more features of different types of cases, the type of prediction target can be used as the basis for selection, and the engineering experience and classification metrics agreed in the industry can be used as key feature values, which are well known to those skilled in the art;
步骤S4.2、计算重要判定特征中离散型特征的雅卡尔距离D1,公式为:Step S4.2. Calculate the Jacquard distance D 1 of the discrete features among the important decision features, the formula is:
上式中,x与y分别为案例与典型工况的离散型特征取值的集合。In the above formula, x and y are the sets of discrete feature values of the case and typical working conditions, respectively.
步骤S4.3、计算重要判定特征中连续型特征的相对欧式距离D2,公式为:Step S4.3. Calculating the relative Euclidean distance D 2 of the continuous features among the important decision features, the formula is:
上式中,xi与yi分别为案例与典型特征在第i项连续型特征的取值,l为重要判定特征中连续型特征的个数。In the above formula, x i and y i are the values of the continuous features of the i item of the case and typical features respectively, and l is the number of continuous features in the important decision features.
步骤S4.4、结合距离D1与D2,计算每个案例的适应性权重,公式为:Step S4.4, combine the distances D 1 and D 2 to calculate the adaptive weight of each case, the formula is:
ωfit=f(D1)+g(D2)ω fit =f(D 1 )+g(D 2 )
式中,f(x)与g(x)均为距离转换函数。本实施例两个距离转换函数均取反比例系数等于1的反比例函数。In the formula, f(x) and g(x) are distance conversion functions. In this embodiment, the two distance conversion functions both take an inverse proportional function with an inverse proportional coefficient equal to 1.
步骤S5、将步骤S3及S4中的权重进行组合并将其作为输入集的特征,得到加权输入集DatasetV2:Step S5, combine the weights in steps S3 and S4 and use them as the features of the input set to obtain the weighted input set DatasetV2:
更具体地,本实施例根据S3中四类来源确定了信息源权重,并根据S4中软土盾构工况确定了适应性权重,由此采用累乘法,即将所有分项权重相乘,以乘积的结果作为组合权重,将权重作为新的特征加入得到数据集中。More specifically, in this embodiment, the information source weight is determined according to the four types of sources in S3, and the adaptability weight is determined according to the soft soil shield working condition in S4, and the cumulative multiplication method is adopted, that is, all sub-item weights are multiplied to obtain The result of the product is used as the combination weight, and the weight is added as a new feature to the data set.
步骤S6、以步骤S5中加权输入集DatasetV2作为输入训练集,以隧道变形值作为输出训练集,组成标准训练集TrainDataset。Step S6, using the weighted input set DatasetV2 in step S5 as the input training set, and using the tunnel deformation value as the output training set to form a standard training set TrainDataset.
本实施例中,运用前序步骤得到的输入训练集,结合资料中收集的隧道竖向变形值,可得到供机器学习算法使用的训练集。In this embodiment, the training set used by the machine learning algorithm can be obtained by using the input training set obtained in the previous steps and combining the vertical deformation value of the tunnel collected in the data.
进一步地,本发明还提供了一种基于机器学习算法的穿越工程隧道变形预测方法,其可以包括利用本发明所述的方法构建多源加权训练集;之后利用所述多源加权训练集来训练机器学习算法。机器学习算法例如可以是有神经网络、决策树之类算法,更具体例如可以包括ANN、CNN、XGboost、lightGBM等等。在经过训练之后,再利用机器学习算法来预测穿越工程隧道变形。Further, the present invention also provides a machine learning algorithm-based deformation prediction method for crossing engineering tunnels, which may include constructing a multi-source weighted training set using the method described in the present invention; and then using the multi-source weighted training set to train machine learning algorithm. Machine learning algorithms can be, for example, algorithms such as neural networks and decision trees, and more specifically, can include ANN, CNN, XGboost, lightGBM, and so on. After training, the machine learning algorithm is used to predict the deformation of the engineering tunnel.
另外,本发明还提供一种基于机器学习算法的穿越工程隧道变形预测系统,该系统可以包括一个或多个处理器以及存储器,所述存储器存储有可由所述一个或多个处理器执行的指令,所述指令使得所述系统能够执行根据本发明所述的各方法步骤。In addition, the present invention also provides a machine learning algorithm-based deformation prediction system for crossing engineering tunnels, the system may include one or more processors and memory, and the memory stores instructions executable by the one or more processors , the instructions enable the system to execute the method steps according to the present invention.
上述对实施例子的描述是为了便于该技术领域的普通技术人员能理解和应用本发明。熟悉本领域技术的人员显然可以容易地对这些实施例进行各种修改,并把在此说明的一般原理应用到其他实施例中而不必经过创造性的劳动。因此,本发明不限于这里的实施例,本领域技术人员根据本发明的揭示,不脱离本发明范畴所做出的改进和修改都应该在本发明的保护范围之内。The above description of the implementation examples is to facilitate the understanding and application of the present invention by those of ordinary skill in the technical field. It is obvious that those skilled in the art can easily make various modifications to these embodiments, and apply the general principles described here to other embodiments without creative efforts. Therefore, the present invention is not limited to the embodiments herein. Improvements and modifications made by those skilled in the art according to the disclosure of the present invention without departing from the scope of the present invention should fall within the protection scope of the present invention.
Claims (10)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2022104719653 | 2022-04-29 | ||
| CN202210471965 | 2022-04-29 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN115374570A true CN115374570A (en) | 2022-11-22 |
| CN115374570B CN115374570B (en) | 2025-06-13 |
Family
ID=84073966
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202211229156.8A Active CN115374570B (en) | 2022-04-29 | 2022-10-09 | A method for constructing a multi-source weighted training set for deformation prediction of tunnel crossing engineering |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN115374570B (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116881723A (en) * | 2023-09-06 | 2023-10-13 | 北京城建设计发展集团股份有限公司 | Data expansion method and system for existing structure response prediction |
| CN120316887A (en) * | 2025-06-16 | 2025-07-15 | 中国电建集团西北勘测设计研究院有限公司 | Tunnel deformation regression prediction model training method, result generation method and equipment |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100306144A1 (en) * | 2009-06-02 | 2010-12-02 | Scholz Martin B | System and method for classifying information |
| CN108805193A (en) * | 2018-06-01 | 2018-11-13 | 广东电网有限责任公司 | A kind of power loss data filling method based on mixed strategy |
| CN111950585A (en) * | 2020-06-29 | 2020-11-17 | 广东技术师范大学 | An XGBoost-based safety assessment method for underground comprehensive utility tunnels |
| CN113283174A (en) * | 2021-06-09 | 2021-08-20 | 中国石油天然气股份有限公司 | Reservoir productivity prediction method, system and terminal based on algorithm integration and self-control |
-
2022
- 2022-10-09 CN CN202211229156.8A patent/CN115374570B/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100306144A1 (en) * | 2009-06-02 | 2010-12-02 | Scholz Martin B | System and method for classifying information |
| CN108805193A (en) * | 2018-06-01 | 2018-11-13 | 广东电网有限责任公司 | A kind of power loss data filling method based on mixed strategy |
| CN111950585A (en) * | 2020-06-29 | 2020-11-17 | 广东技术师范大学 | An XGBoost-based safety assessment method for underground comprehensive utility tunnels |
| CN113283174A (en) * | 2021-06-09 | 2021-08-20 | 中国石油天然气股份有限公司 | Reservoir productivity prediction method, system and terminal based on algorithm integration and self-control |
Non-Patent Citations (1)
| Title |
|---|
| 李健;林韶生;陈芳;杜佩仁;: "基于大数据的台区行业聚合分类方法及分类特征分析", 电力大数据, no. 03, 21 March 2020 (2020-03-21), pages 7 - 15 * |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116881723A (en) * | 2023-09-06 | 2023-10-13 | 北京城建设计发展集团股份有限公司 | Data expansion method and system for existing structure response prediction |
| CN116881723B (en) * | 2023-09-06 | 2024-02-20 | 北京城建设计发展集团股份有限公司 | Data expansion method and system for existing structure response prediction |
| CN120316887A (en) * | 2025-06-16 | 2025-07-15 | 中国电建集团西北勘测设计研究院有限公司 | Tunnel deformation regression prediction model training method, result generation method and equipment |
Also Published As
| Publication number | Publication date |
|---|---|
| CN115374570B (en) | 2025-06-13 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN114971301B (en) | Ecological interference risk identification and evaluation method based on automatic parameter adjustment optimization model | |
| CN106022518B (en) | A Prediction Method of Pipeline Damage Probability Based on BP Neural Network | |
| CN107169628B (en) | A distribution network reliability assessment method based on big data mutual information attribute reduction | |
| CN104021267B (en) | A kind of susceptibility of geological hazards decision method and device | |
| CN109711636A (en) | River water level prediction method based on chaotic firefly and gradient lifting tree model | |
| CN107610021A (en) | The comprehensive analysis method of environmental variance spatial and temporal distributions | |
| CN113191642B (en) | Regional landslide sensitivity analysis method based on optimal combination strategy | |
| CN103898890B (en) | Soil layer quantization layering method based on double-bridge static sounding data of BP neural network | |
| CN110489844A (en) | One kind being suitable for the uneven large deformation grade prediction technique of soft rock tunnel | |
| CN105005822A (en) | Optimal step length and dynamic model selection based ultrahigh arch dam response prediction method | |
| CN111199298A (en) | Flood forecasting method and system based on neural network | |
| CN118550573B (en) | IT operation and maintenance management method and IT operation and maintenance management device | |
| CN104881715A (en) | Paper plant pulp property prediction method based on ratio of waste paper | |
| CN115728463A (en) | Interpretable water quality prediction method based on semi-embedded feature selection | |
| CN115063056A (en) | Improved construction behavior safety risk dynamic analysis method based on graph topology analysis | |
| CN116776717A (en) | Multi-objective dynamic optimization method of drilling parameters based on improved NSGA-III algorithm | |
| CN118133104A (en) | Rapid identification method for lithofacies of deep sea-phase shale gas well | |
| CN115374570A (en) | Multi-source weighted training set construction method for deformation prediction of engineering tunnel crossing | |
| CN104268662B (en) | A Subsidence Prediction Method Based on Step-by-step Optimal Quantile Regression | |
| CN116070385A (en) | Automatic risk identification method and system for overseas mineral resource supply chain | |
| CN114897378A (en) | Geological disaster refined meteorological risk early warning area evaluation method and device | |
| CN119090313A (en) | Intelligent decision-making method and system for preventing and controlling geological disasters induced by TBM construction disturbance | |
| CN119004584B (en) | Real-time control method and system for grouting disaster control in underground engineering based on digital twin | |
| CN118822284A (en) | Tunnel operation safety assessment method and system based on multi-source risk factors | |
| CN107239889A (en) | A kind of method of the lower mountain area structure vulnerability of quantitative assessment mud-rock flow stress |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |