CN115374570A

CN115374570A - Multi-source weighted training set construction method for deformation prediction of engineering tunnel crossing

Info

Publication number: CN115374570A
Application number: CN202211229156.8A
Authority: CN
Inventors: 韩玉珍; 聂小凡; 张雷; 张连卫; 潘毫; 何纪忠
Original assignee: Beijing Urban Construction Design and Development Group Co Ltd
Current assignee: Beijing Urban Construction Design and Development Group Co Ltd
Priority date: 2022-04-29
Filing date: 2022-10-09
Publication date: 2022-11-22
Anticipated expiration: 2042-10-09
Also published as: CN115374570B

Abstract

The invention discloses a method for constructing a multi-source weighted training set for deformation prediction of a crossing project tunnel, comprising: step S1, acquiring design data of the crossing project and corresponding tunnel deformation; step S2, processing the design data to obtain initial input Set DatasetV1; step S3, determine the weight of the information source according to the source of the design data; step S4, determine the adaptive weight of different data according to the value of the key feature of the data; step S5, combine the weights of steps S3 and S4 and use it as an input set features; step S6, using the input set in step S5 as an input training set, and using the tunnel deformation value as an output training set to form a training set. In the present invention, through structured and weighted processing of multi-source engineering information, a training set that can reflect the characteristics of the traversing engineering and is beneficial to the use of machine learning algorithms is obtained, effectively supporting the intelligent prediction of tunnel deformation, and providing quantification for related engineering evaluation in accordance with.

Description

A Multi-source Weighted Training Set Construction Method for Deformation Prediction Through Engineering Tunnels

技术领域technical field

本发明属于隧道穿越工程技术领域，具体涉及一种用于穿越工程隧道变形预测的多源加权训练集构建方法。The invention belongs to the technical field of tunnel crossing engineering, and in particular relates to a method for constructing a multi-source weighted training set for deformation prediction of tunnel crossing engineering.

背景技术Background technique

随着计算机处理能力与智能算法的不断发展，机器学习已逐渐成为工业界解决实际问题的有效手段。机器学习方法中的监督学习可以根据现有案例，直接建立输入与输出之间的潜在联系，从而避免了复杂的中间链路，可简单、快速、准确地预测所需量，是各领域实现智能化、自动化的通用工具。而这种监督学习的模式高度依赖于学习算法使用的训练集，运用结构化且特征清晰的训练集能帮助学习算法更高效地达成学习目标。With the continuous development of computer processing power and intelligent algorithms, machine learning has gradually become an effective means for solving practical problems in the industry. Supervised learning in machine learning methods can directly establish the potential connection between input and output based on existing cases, thus avoiding complex intermediate links, and can predict the required quantity simply, quickly and accurately, which is the realization of intelligence in various fields. General tools for automation and automation. This supervised learning model is highly dependent on the training set used by the learning algorithm. Using a structured and clearly characterized training set can help the learning algorithm achieve the learning goal more efficiently.

对于穿越工程的隧道变形，大量的工程实践与学者研究积累了诸多有价值的数据，为机器学习解决变形预测问题提供了优良的条件。但这些数据来源多样，包含现场实测、数值模拟、理论分析以及模型试验等多种研究手段得出的特征不一的数据，同时还存在诸如设计图、云图等诸多非结构化的数据，这些数据均难以被相关学习算法直接利用。另一方面，多样信息来源的可信程度存在差异，同时变形预测与工程本身的属性高度关联，案例的贴合程度对预测有重要影响，因此机器学习算法在该领域运用时，常不能简单地认为每个学习案例都具有同样的价值，而应根据需求与先验性的认知对训练集进行调整。For tunnel deformation through engineering, a large number of engineering practices and scholars' research have accumulated a lot of valuable data, which provides excellent conditions for machine learning to solve deformation prediction problems. However, these data sources are diverse, including data with different characteristics obtained from various research methods such as field measurements, numerical simulations, theoretical analysis, and model tests. At the same time, there are many unstructured data such as design drawings and cloud images. These data are difficult to be directly utilized by relevant learning algorithms. On the other hand, there are differences in the credibility of various information sources. At the same time, the deformation prediction is highly related to the attributes of the project itself, and the degree of fit of the case has an important impact on the prediction. Each learning case is considered to have the same value, and the training set should be adjusted according to needs and prior knowledge.

因此，需要一种能从既有数据资料出发，适配相关学习算法使用需求的训练集生成方法，以解决现有技术中存在的缺陷。Therefore, there is a need for a training set generation method that can start from existing data and adapt to the requirements of the use of relevant learning algorithms, so as to solve the defects in the prior art.

发明内容Contents of the invention

为了解决上述技术问题，本发明提供了一种用于穿越工程隧道变形预测的多源加权训练集构建方法，旨在优化修改设计资料中的特征内容，以特征工程的角度对多源数据进行重塑，并针对穿越工程特点对重塑数据进行权重化处理，得到可直接运用于机器学习算法的训练集，为高效预测隧道变形提供有力支持。另外，还提供了一种基于机器学习算法的穿越工程隧道变形预测方法以及系统。In order to solve the above technical problems, the present invention provides a method for constructing a multi-source weighted training set for deformation prediction of engineering tunnels. According to the characteristics of crossing engineering, the reshaping data is weighted to obtain a training set that can be directly applied to machine learning algorithms, providing strong support for efficient prediction of tunnel deformation. In addition, a method and system for predicting the deformation of a crossing engineering tunnel based on a machine learning algorithm are also provided.

为了达到上述目的，本发明采用的技术方案包括：In order to achieve the above object, the technical scheme adopted in the present invention comprises:

一种用于穿越工程隧道变形预测的多源加权训练集构建方法，其特征在于，包括以下步骤：A method for constructing a multi-source weighted training set for deformation prediction of an engineering tunnel, characterized in that it comprises the following steps:

步骤S1、获取穿越工程的设计数据以及对应的隧道变形，其中设计数据为训练集的输入部分，包含工程地质信息、隧道结构信息、施工信息，隧道变形为训练集的输出部分；Step S1, obtaining the design data of the crossing project and the corresponding tunnel deformation, wherein the design data is the input part of the training set, including engineering geological information, tunnel structure information, and construction information, and the tunnel deformation is the output part of the training set;

步骤S2、根据步骤S1中设计数据的特征将其分为数值型的数据和非数值型的数据，将数值型的数据组成数据集ASet，对非数值型的数据进行处理得到数据集Bset，组合Aset与Bset，得到初始输入集DatasetV1；Step S2, divide it into numerical data and non-numeric data according to the characteristics of the design data in step S1, form the data set ASet with the numerical data, process the non-numeric data to obtain the data set Bset, combine Aset and Bset, get the initial input set DatasetV1;

步骤S3、根据初始输入集DatasetV1中设计数据的来源，采用一定衡量策略确定信息源权重；Step S3, according to the source of the design data in the initial input set DatasetV1, adopt a certain measurement strategy to determine the weight of the information source;

步骤S4、根据初始输入集DatasetV1中设计数据关键特征的取值情况，确定不同数据的适应性权重；Step S4, according to the values of the key features of the design data in the initial input set DatasetV1, determine the adaptability weights of different data;

步骤S5、将步骤S3及S4中的权重进行组合并将其作为输入集的特征，得到加权输入集DatasetV2；Step S5, combining the weights in steps S3 and S4 and using it as the feature of the input set to obtain the weighted input set DatasetV2;

步骤S6、以步骤S5中加权输入集DatasetV2作为输入训练集，以隧道变形数值作为输出训练集，组成标准训练集TrainDataset。Step S6, using the weighted input dataset DatasetV2 in step S5 as the input training set, and using the tunnel deformation value as the output training set to form a standard training set TrainDataset.

根据本发明的实施方案，步骤S2中，对非数值型的数据的处理包括采用One-Hot方法对非数值型的数据进行编码，编码过程包括以下步骤：According to an embodiment of the present invention, in step S2, the processing of the non-numeric data comprises adopting the One-Hot method to encode the non-numeric data, and the encoding process includes the following steps:

步骤S2.1、筛选特征为非数值型的数据并组成数据集FeatureSet，数据集FeatureSet中的特征总数记为m，每个特征对应的数据集记为FeatureSet_i(i＝1，2，……，m)；Step S2.1, filter data with non-numeric features and form a dataset FeatureSet. The total number of features in the dataset FeatureSet is denoted as m, and the dataset corresponding to each feature is denoted as FeatureSet _i (i=1, 2, ... , m);

步骤S2.3、针对每个特征数据集FeatureSet_i，划分出n_i个状态特征，并根据每个案例的取值，得到在状态特征表示下的数据集ConditionSet_i，其中n_i为该特征下所有离散取值的个数；以及Step S2.3. For each feature data set FeatureSet _i , divide n _i state features, and according to the value of each case, obtain the data set ConditionSet _i under the state feature representation, where n _i is the state feature under the feature the number of all discrete values; and

步骤S2.4、组合m个状态特征表示下的数据集ConditionSet_i，得到非数值型数据状态特征表示数据集Bset。Step S2.4, combining m data sets ConditionSet _i represented by state features to obtain a non-numeric data state feature representation data set Bset.

根据本发明的实施方案，所述步骤S3中的设计数据来源包括理论分析、数值模拟、模型试验、现场监测。According to an embodiment of the present invention, the sources of design data in the step S3 include theoretical analysis, numerical simulation, model test, and field monitoring.

根据本发明的实施方案，所述步骤S3中信息源权重为衡量不同数据来源重要度的指标，具体的衡量策略可以包括：According to an embodiment of the present invention, the weight of the information source in the step S3 is an indicator for measuring the importance of different data sources, and the specific measurement strategy may include:

(1)专家打分法，根据专家评价直接确定每一种数据来源的得分，使同一种来源的数据具有相同的权重；(1) Expert scoring method, directly determine the score of each data source according to expert evaluation, so that data from the same source have the same weight;

(2)区间随机生成法，规定每一种数据来源的权重区间，对每个案例根据其来源在权重区间内随机生成权重；(2) Interval random generation method, which stipulates the weight interval of each data source, and randomly generates weight within the weight interval according to its source for each case;

(3)排序比较法，对数据来源的重要程度进行排序，从数据集中不放回抽取多个案例并按重要程度排列，根据名次高低分别予以由高到低的权重，直至所有案例被抽取；以及(3) Sorting and comparison method, sorting the importance of data sources, extracting multiple cases from the data set without replacement and sorting them according to the importance, and assigning weights from high to low according to the ranking, until all cases are extracted; as well as

(4)综合法，采用两种或两种以上的方法分别得到的权重，综合得到最终权重。(4) The comprehensive method, using the weights obtained by two or more methods respectively, to obtain the final weight by synthesis.

根据本发明的实施方案，所述步骤S4中适应性权重为衡量数据与典型工况的接近程度，包括以下步骤：According to an embodiment of the present invention, the adaptive weight in the step S4 is to measure the closeness of the data to the typical working conditions, including the following steps:

步骤S4.1、确定典型工况的重要判定特征值y_i(i＝1，2，……，n)，其中n为重要判定特征(也即关键特征)的数量；Step S4.1, determine the important judgment feature value y _i (i=1, 2, ..., n) of typical working conditions, where n is the number of important judgment features (ie key features);

步骤S4.2、计算重要判定特征中离散型特征的雅卡尔距离D₁，公式为：Step S4.2. Calculate the Jacquard distance D ₁ of the discrete features among the important decision features, the formula is:

上式中，x与y分别为案例与典型工况的离散型特征取值的集合；In the above formula, x and y are the sets of discrete feature values of the case and typical working conditions respectively;

步骤S4.3、计算重要判定特征中连续型特征的相对欧式距离D₂，公式为：Step S4.3. Calculating the relative Euclidean distance D ₂ of the continuous features among the important decision features, the formula is:

上式中，x_i与y_i分别为案例与典型工况在第i项连续型特征的取值，l为重要判定特征中连续型特征的个数；以及In the above formula, x _i and y _i are the values of the continuous features of item i in the case and typical working conditions respectively, and l is the number of continuous features in the important judgment features; and

步骤S4.4、结合距离D₁与D₂，计算每个案例的适应性权重，公式为：Step S4.4, combine the distances D ₁ and D ₂ to calculate the adaptive weight of each case, the formula is:

ω_fit＝f(D₁)+g(D₂)ω _fit =f(D ₁ )+g(D ₂ )

式中，f(x)与g(x)均为距离转换函数，可选用反比例系数为正数的反比例函数、斜率为负数截距为正数的一次函数或其他值域为正数的单调递减函数。In the formula, both f(x) and g(x) are distance conversion functions, and the inverse proportional function whose inverse proportional coefficient is a positive number, the linear function whose slope is a negative number and the intercept is a positive number, or other monotonically decreasing values with a positive value range function.

根据本发明的实施方案，所述步骤S5的权重组合为根据每个案例的信息源权重与适应性权重，通过放缩、分配等组合策略得到案例的综合权重，具体的组合策略包括：According to an embodiment of the present invention, the weight combination in step S5 is based on the information source weight and adaptive weight of each case, and the comprehensive weight of the case is obtained through combination strategies such as scaling and distribution. The specific combination strategy includes:

(1)累加法，规定分项权重占组合权重的比例，将按比例累加的结果作为组合权重；(1) Accumulation method, which stipulates the proportion of sub-item weights to combined weights, and the result of cumulative addition in proportion is taken as combined weights;

(2)累乘法，将所有分项权重相乘，以乘积的结果作为组合权重；(2) Cumulative multiplication, multiplying all sub-item weights, and taking the result of the product as the combined weight;

(3)取小法，选择所有分项权重中的最小值作为组合权重；(3) Take the small method, select the minimum value in all sub-item weights as the combined weight;

(4)取大法，选择所有分项权重中的最大值作为组合权重；以及(4) Take the big method, select the maximum value in all sub-item weights as the combined weight; and

(5)随机法，以所有分项权重中的最小值作为区间下界，最大值作为区间上界，在区间内随机生成权重作为组合权重。(5) Random method, the minimum value of all sub-item weights is used as the lower boundary of the interval, the maximum value is used as the upper boundary of the interval, and the weights are randomly generated within the interval as the combination weight.

另外，本发明还提供了基于机器学习算法的穿越工程隧道变形预测方法以及系统。In addition, the present invention also provides a deformation prediction method and system for crossing engineering tunnels based on machine learning algorithms.

与现有技术相比，本发明具有以下优点：Compared with the prior art, the present invention has the following advantages:

(1)本发明从已有易得的工程资料出发，通过数字化的编码的转化手段，使多源数据具有统一的结构化格式，并对非数值型特征进行One-hot处理，避免了非数值数据的可加性损失问题，大大增强了训练集的可用程度，从而支撑了相关机器学习算法的使用，为快速预测提供了基础。(1) The present invention starts from the existing engineering data that is easy to obtain, and through digital coding transformation means, the multi-source data has a unified structured format, and performs One-hot processing on non-numerical features, avoiding non-numerical features. The data additivity loss problem greatly enhances the availability of the training set, thus supporting the use of related machine learning algorithms and providing a basis for rapid prediction.

(2)本发明针对穿越工程特点，在满足学习算法使用要求的前提下对训练集中样本进行加权处理，从而有针对性地在数据层面融入了穿越工程的关键特性，以此为相关学习算法提供了大量关键的学习特征。(2) According to the characteristics of the traversing project, the present invention weights the samples in the training set under the premise of meeting the requirements of the learning algorithm, so as to incorporate the key characteristics of the traversing project at the data level in a targeted manner, so as to provide relevant learning algorithms A large number of key learning features.

附图说明Description of drawings

图1为根据本发明实施方案的用于穿越工程隧道变形预测的多源加权训练集构建方法的流程示意图；以及Fig. 1 is a schematic flow chart of a method for constructing a multi-source weighted training set for deformation prediction of a tunnel through an engineering tunnel according to an embodiment of the present invention; and

图2为根据本发明实施方案的用于穿越工程隧道变形预测的多源加权训练集构建方法中的初始输入集DatasetV1生成过程示意图。Fig. 2 is a schematic diagram of the generation process of the initial input set DatasetV1 in the method for constructing the multi-source weighted training set for deformation prediction of tunnel crossing according to the embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图、通过具体实施例对本发明进一步详述，所示内容用于充分阐述本发明的内容，而并不用于限制本发明。The present invention will be further described in detail through specific embodiments below in conjunction with the accompanying drawings, and the shown content is used to fully illustrate the content of the present invention, but not to limit the present invention.

以预测穿越施工情况下既有隧道的竖向变形为例，详细说明本发明用于穿越工程隧道变形预测的多源加权训练集构建方法。如附图1所示，本发明的方案可以包括以下步骤：Taking the prediction of the vertical deformation of an existing tunnel in the case of crossing construction as an example, the method for constructing a multi-source weighted training set for deformation prediction of a crossing engineering tunnel according to the present invention is described in detail. As shown in accompanying drawing 1, the scheme of the present invention may comprise the following steps:

步骤S1、获取穿越工程的设计数据以及对应的隧道变形，其中设计数据为训练集的输入部分，可以包含工程地质信息、隧道结构信息、施工信息等，隧道变形为训练集的输出部分：Step S1. Obtain the design data of the crossing project and the corresponding tunnel deformation. The design data is the input part of the training set, which can include engineering geological information, tunnel structure information, construction information, etc., and the tunnel deformation is the output part of the training set:

对于本实施例，收集了工程设计书、公开发表的学术文献、相关监测报告等多渠道数据，整理筛选后得到了共计187个设计数据案例及对应的隧道竖向变形值。For this embodiment, multi-channel data such as engineering design documents, published academic documents, and related monitoring reports were collected, and a total of 187 design data cases and corresponding tunnel vertical deformation values were obtained after sorting and screening.

步骤S2、根据步骤S1中设计数据的特征将其分为数值型的数据和非数值型的数据，将数值型的数据组成数据集ASet，对非数值型的数据进行处理得到数据集Bset，组合Aset与Bset，得到初始输入集DatasetV1(参见附图2)：Step S2, divide it into numerical data and non-numeric data according to the characteristics of the design data in step S1, form the data set ASet with the numerical data, process the non-numeric data to obtain the data set Bset, combine Aset and Bset, get the initial input set DatasetV1 (see Figure 2):

根据案例信息进行归纳，整理了共同或可转化的设计数据特征，具体包括工程地质特征30项、隧道结构特征10项、施工信息5项共计45个特征，将上述数据分为数值型和非数值型数据：According to the case information, the common or transformable design data characteristics are sorted out, specifically including 30 engineering geological characteristics, 10 tunnel structure characteristics, and 5 construction information, a total of 45 characteristics, and the above data are divided into numerical and non-numerical type data:

数值型的数据组成数据集ASet的步骤包括筛选特征为数值型的数据并组成数据集ASet。本实施例数据中有42个特征为数值型数据，共同组成数据集ASet。The step of forming the data set ASet from numerical data includes filtering the data whose characteristic is numerical and forming the data set ASet. In the data of this embodiment, 42 features are numerical data, which together form the data set ASet.

采用One-Hot方法对非数值型的数据进行编码，编码过程包括：Use the One-Hot method to encode non-numeric data. The encoding process includes:

步骤S2.1、筛选特征为非数值型的数据并组成数据集FeatureSet，数据集FeatureSet中的特征总数记为m，每个特征对应的数据集记为FeatureSet_i(i＝1，2，……，m)。本实施例的数据中，共有“衬砌类型”、“工法”、“控制措施”3个特征为非数值型数据，由此组成特征总数m＝3的FeatureSet数据集，三个特征的数据集分别对应FeatureSet₁、FeatureSet₂、FeatureSet₃，如表1所示。Step S2.1, filter data with non-numeric features and form a dataset FeatureSet. The total number of features in the dataset FeatureSet is denoted as m, and the dataset corresponding to each feature is denoted as FeatureSet _i (i=1, 2, ... , m). In the data of this embodiment, there are 3 features of "lining type", "construction method" and "control measures" which are non-numeric data, thus forming a FeatureSet data set with the total number of features m=3, and the data sets of the three features are respectively Corresponding to FeatureSet ₁ , FeatureSet ₂ , and FeatureSet ₃ , as shown in Table 1.

表1：FeartureSet数据集Table 1: FeartureSet dataset

案例编号case number FeatureSet1FeatureSet1 FeatureSet2FeatureSet2 FeatureSet3FeatureSet3 11 盾构管片Shield segments 盾构法shield method 微扰动注浆micro-disturbance grouting 22 喷射混凝土shotcrete 矿山法mine law 无none ……... 187187 盾构管片Shield segments TBM法TBM method 端头加固End reinforcement

步骤S2.2、针对每个特征数据集FeatureSet_i，划分出n_i个状态特征，并根据每个案例的取值，得到在状态特征表示下的数据集ConditionSet_i，其中n_i为该特征下所有离散取值的个数。本实施例中，以“工法”组成的FeatureSet₂为例，187个案例中的特征取值包含“盾构法”、“TBM法”、“矿山法”、“明挖法”四种离散取值，据此划分出n₂＝4个状态特征并分别对应原离散取值，案例属于该离散取值时特征值为1，不属于时特征值为0，从而生成状态特征表示下的数据集ConditionSet₂，具体如表2所示。Step S2.2. For each feature data set FeatureSet _i , divide n _i state features, and according to the value of each case, obtain the data set ConditionSet _i represented by the state feature, where n _i is the state feature under the feature The number of all discrete values. In this embodiment, taking FeatureSet ₂ composed of "construction methods" as an example, the feature values in 187 cases include four discrete values of "shield tunneling method", "TBM method", "mine method" and "open cut method". Value, based on which n ₂ = 4 state features are divided and correspond to the original discrete values respectively. When the case belongs to the discrete value, the feature value is 1, and when it does not belong to the feature value, the feature value is 0, thus generating the data set under the state feature representation ConditionSet ₂ , as shown in Table 2 for details.

表2：ConditionSet₂数据集Table 2: ConditionSet ₂ dataset

步骤S2.3、组合m个状态特征表示下的数据集ConditionSet_i，得到非数值型数据状态特征表示数据集Bset。Step S2.3, combining m data sets ConditionSet _i represented by state features to obtain a non-numeric data state feature representation data set Bset.

之后对初始输入集DatasetV1进行进一步的处理。After that, the initial input set DatasetV1 is further processed.

步骤S3、根据设计数据的来源，采用一定衡量策略确定信息源权重；信息源权重为衡量不同数据来源重要度的指标，可以采用专家打分法、区间随机生成法、排序比较法以及综合法等来确定。Step S3, according to the source of the design data, use a certain measurement strategy to determine the weight of the information source; the weight of the information source is an index to measure the importance of different data sources, and can be determined by expert scoring method, interval random generation method, sorting comparison method and comprehensive method, etc. Sure.

设计数据来源可以包括理论分析、数值模拟、模型试验以及现场监测等。对于本实施例，经过统计，共有基于理论分析的案例15个、基于数值模拟的案例136个、基于模型试验的案例7个、基于现场监测的案例29个。本实施例根据专家打分法，权衡后得到理论分析、数值模拟、模型试验、现场检测的重要度权重分别为0.07、0.24、0.12、0.57。Sources of design data may include theoretical analysis, numerical simulation, model tests, and field monitoring. For this embodiment, after statistics, there are 15 cases based on theoretical analysis, 136 cases based on numerical simulation, 7 cases based on model test, and 29 cases based on field monitoring. In this embodiment, according to the expert scoring method, the importance weights of theoretical analysis, numerical simulation, model test, and on-site inspection are respectively 0.07, 0.24, 0.12, and 0.57 after weighing.

步骤S4、根据设计数据关键特征的取值情况，确定不同数据的适应性权重，适应性权重为衡量数据与典型工况的接近程度：Step S4, according to the values of the key features of the design data, determine the adaptability weight of different data, the adaptability weight is to measure the closeness of the data to the typical working conditions:

本实施例中以软土地区盾构隧道作为典型案例，所构造的训练集以预测软土地区沉降为主要目的，同时参考其他工况的潜在影响规律，由此按以下具体步骤计算适应性权重：In this embodiment, shield tunnels in soft soil areas are taken as a typical case. The training set constructed is mainly aimed at predicting settlement in soft soil areas. At the same time, referring to the potential influence laws of other working conditions, the adaptive weight is calculated according to the following specific steps :

步骤S4.1、确定典型工况的重要判定特征值y_i(i＝1，2，……，n)，其中n为重要判定特征(关键特征)的数量。本实施例以“土层模量”、“工法”两个特征作为关键特征，以3000kPa、“盾构法”为关键特征取值，值得说明的是“工法”特征已在步骤S2中进行了转换，此处以转换后的状态特征作为关键特征。其中，典型工况为穿越工程中具有代表性的一种或多种案例，其在部分特征上存在一个或多个重要判定特征值，该值可反应该类工况的特点；关键特征是指区分不同类型案例种类的一个或多个特征，可以以预测目标的种类作为选取依据，以行业内共识的工程经验、分类度量作为关键特征取值，这为本领域普通技术人员所熟知；Step S4.1. Determine important judgment feature values y _i (i=1, 2, . . . , n) of typical operating conditions, where n is the number of important judgment features (key features). In this embodiment, the two features of "soil layer modulus" and "construction method" are used as key features, and 3000kPa and "shield tunneling method" are used as key feature values. It is worth noting that the feature of "construction method" has been carried out in step S2. Transformation, where the transformed state characteristics are used as key characteristics. Among them, a typical working condition refers to one or more representative cases in the crossing project, and there are one or more important judgment characteristic values in some characteristics, which can reflect the characteristics of this type of working condition; key features refer to To distinguish one or more features of different types of cases, the type of prediction target can be used as the basis for selection, and the engineering experience and classification metrics agreed in the industry can be used as key feature values, which are well known to those skilled in the art;

上式中，x与y分别为案例与典型工况的离散型特征取值的集合。In the above formula, x and y are the sets of discrete feature values of the case and typical working conditions, respectively.

上式中，x_i与y_i分别为案例与典型特征在第i项连续型特征的取值，l为重要判定特征中连续型特征的个数。In the above formula, x _i and y _i are the values of the continuous features of the i item of the case and typical features respectively, and l is the number of continuous features in the important decision features.

ω_fit＝f(D₁)+g(D₂)ω _fit =f(D ₁ )+g(D ₂ )

式中，f(x)与g(x)均为距离转换函数。本实施例两个距离转换函数均取反比例系数等于1的反比例函数。In the formula, f(x) and g(x) are distance conversion functions. In this embodiment, the two distance conversion functions both take an inverse proportional function with an inverse proportional coefficient equal to 1.

步骤S5、将步骤S3及S4中的权重进行组合并将其作为输入集的特征，得到加权输入集DatasetV2：Step S5, combine the weights in steps S3 and S4 and use them as the features of the input set to obtain the weighted input set DatasetV2:

更具体地，本实施例根据S3中四类来源确定了信息源权重，并根据S4中软土盾构工况确定了适应性权重，由此采用累乘法，即将所有分项权重相乘，以乘积的结果作为组合权重，将权重作为新的特征加入得到数据集中。More specifically, in this embodiment, the information source weight is determined according to the four types of sources in S3, and the adaptability weight is determined according to the soft soil shield working condition in S4, and the cumulative multiplication method is adopted, that is, all sub-item weights are multiplied to obtain The result of the product is used as the combination weight, and the weight is added as a new feature to the data set.

步骤S6、以步骤S5中加权输入集DatasetV2作为输入训练集，以隧道变形值作为输出训练集，组成标准训练集TrainDataset。Step S6, using the weighted input set DatasetV2 in step S5 as the input training set, and using the tunnel deformation value as the output training set to form a standard training set TrainDataset.

本实施例中，运用前序步骤得到的输入训练集，结合资料中收集的隧道竖向变形值，可得到供机器学习算法使用的训练集。In this embodiment, the training set used by the machine learning algorithm can be obtained by using the input training set obtained in the previous steps and combining the vertical deformation value of the tunnel collected in the data.

进一步地，本发明还提供了一种基于机器学习算法的穿越工程隧道变形预测方法，其可以包括利用本发明所述的方法构建多源加权训练集；之后利用所述多源加权训练集来训练机器学习算法。机器学习算法例如可以是有神经网络、决策树之类算法，更具体例如可以包括ANN、CNN、XGboost、lightGBM等等。在经过训练之后，再利用机器学习算法来预测穿越工程隧道变形。Further, the present invention also provides a machine learning algorithm-based deformation prediction method for crossing engineering tunnels, which may include constructing a multi-source weighted training set using the method described in the present invention; and then using the multi-source weighted training set to train machine learning algorithm. Machine learning algorithms can be, for example, algorithms such as neural networks and decision trees, and more specifically, can include ANN, CNN, XGboost, lightGBM, and so on. After training, the machine learning algorithm is used to predict the deformation of the engineering tunnel.

另外，本发明还提供一种基于机器学习算法的穿越工程隧道变形预测系统，该系统可以包括一个或多个处理器以及存储器，所述存储器存储有可由所述一个或多个处理器执行的指令，所述指令使得所述系统能够执行根据本发明所述的各方法步骤。In addition, the present invention also provides a machine learning algorithm-based deformation prediction system for crossing engineering tunnels, the system may include one or more processors and memory, and the memory stores instructions executable by the one or more processors , the instructions enable the system to execute the method steps according to the present invention.

上述对实施例子的描述是为了便于该技术领域的普通技术人员能理解和应用本发明。熟悉本领域技术的人员显然可以容易地对这些实施例进行各种修改，并把在此说明的一般原理应用到其他实施例中而不必经过创造性的劳动。因此，本发明不限于这里的实施例，本领域技术人员根据本发明的揭示，不脱离本发明范畴所做出的改进和修改都应该在本发明的保护范围之内。The above description of the implementation examples is to facilitate the understanding and application of the present invention by those of ordinary skill in the technical field. It is obvious that those skilled in the art can easily make various modifications to these embodiments, and apply the general principles described here to other embodiments without creative efforts. Therefore, the present invention is not limited to the embodiments herein. Improvements and modifications made by those skilled in the art according to the disclosure of the present invention without departing from the scope of the present invention should fall within the protection scope of the present invention.

Claims

1. A method for building a multi-source weighted training set for deformation prediction through engineering tunnels, comprising the following steps:

Step S1, obtaining the design data of the crossing project and the corresponding tunnel deformation data, wherein the design data is used as the input part of the training set, including engineering geological information, tunnel structure information and construction information, and the tunnel deformation is used as the output part of the training set;

Step S2, according to the characteristics of the design data in step S1, the design data is divided into numerical data and non-numeric data, and the numerical data is formed into a data set ASet, and the non-numeric data is processed to obtain a data set Bset , combine Aset and Bset to get the initial input set DatasetV1;

Step S3, according to the source of the design data in the initial input set DatasetV1, adopt a certain measurement strategy to determine the weight of the information source;

Step S4, according to the values of the key features of the design data in the initial input set DatasetV1, determine the adaptability weights of different data;

Step S5, combining the weights in steps S3 and S4 and using it as the feature of the input set to obtain the weighted input set DatasetV2;

Step S6, using the weighted input set DatasetV2 in step S5 as an input training set, and using the tunnel deformation data as an output training set to form a standard training set TrainDataset.

2. A kind of multi-source weighted training set construction method for deformation prediction of crossing engineering tunnels as claimed in claim 1, characterized in that: in step S2, the processing of non-numerical data includes adopting the One-Hot method to To encode non-numeric data, the encoding process includes the following steps:

Step S2.1, screen the data whose design features are non-numerical and form a dataset FeatureSet. The total number of features in the dataset FeatureSet is denoted as m, and the dataset corresponding to each feature is denoted as FeatureSet _i (i=1, 2, ... ..., m);

Step S2.2. For each feature data set FeatureSet _i , divide n _i state features, and according to the value of each case, obtain the data set ConditionSet _i represented by the state feature, where n _i is the state feature under the feature the number of all discrete values; and

Step S2.3, combining m data sets ConditionSet _i represented by state features to obtain a non-numeric data state feature representation data set Bset.

3. A kind of multi-source weighted training set construction method that is used for deformation prediction of crossing engineering tunnels as claimed in claim 1, is characterized in that: the design data source in the described step S3 comprises theoretical analysis, numerical simulation, model test and On-site monitoring.

4. A method for constructing a multi-source weighted training set for deformation prediction of crossing engineering tunnels as claimed in claim 1, characterized in that: the weight of information sources in the step S3 is an index for measuring the importance of different data sources, and the weight of Method selected from:

(1) Expert scoring method, directly determine the score of each data source according to expert evaluation, so that data from the same source have the same weight;

(2) Interval random generation method, which stipulates the weight interval of each data source, and randomly generates weight within the weight interval according to its source for each case;

(3) Sorting and comparison method, sorting the importance of data sources, extracting multiple cases from the data set without replacement and sorting them according to the importance, and assigning weights from high to low according to the ranking, until all cases are extracted; as well as

(4) Comprehensive method, using the weights obtained by two or more methods respectively, and combining the weights to obtain the final weight.

5. A method for constructing a multi-source weighted training set for deformation prediction of crossing engineering tunnels as claimed in claim 1, characterized in that: in the step S4, the adaptive weight is to measure the closeness of the data to the typical working conditions, Its determination includes the following steps:

Step S4.1, determining key feature values y _i (i=1, 2, ..., n) of typical operating conditions, where n is the number of key features;

Step S4.2. Calculate the Jacquard distance D ₁ of the discrete features among the key features, the formula is:

In the above formula, x and y are the sets of discrete feature values of the case and typical working conditions respectively;

Step S4.3. Calculate the relative Euclidean distance D ₂ of the continuous features among the key features, the formula is:

In the above formula, x _i and y _i are the value of the continuous feature of item i in the case and typical working conditions respectively, and l is the number of continuous features in the key features; and

Step S4.4, combine the distances D ₁ and D ₂ to calculate the adaptive weight of each case, the formula is:

ω _fit =f(D ₁ )+g(D ₂ )

In the formula, f(x) and g(x) are distance conversion functions.

6. A method for constructing a multi-source weighted training set for deformation prediction of crossing engineering tunnels as claimed in claim 5, wherein the distance conversion function is selected from an inverse proportional function whose inverse proportional coefficient is a positive number, and the slope is a negative intercept A linear function of positive numbers or a monotonically decreasing function of positive numbers.

7. A method for constructing a multi-source weighted training set for deformation prediction of crossing engineering tunnels as claimed in claim 1, characterized in that: the weight combination in step S5 is based on information source weights and adaptive weights, by combining To get the comprehensive weight, the combination strategy is selected from:

(1) Accumulation method, which stipulates the proportion of sub-item weights to combined weights, and the result of cumulative addition in proportion is taken as combined weights;

(2) Cumulative multiplication, multiplying all sub-item weights, and taking the result of the product as the combined weight;

(3) Take the small method, select the minimum value in all sub-item weights as the combined weight;

(4) Take the big method, select the maximum value in all sub-item weights as the combined weight; and

(5) Random method, the minimum value of all sub-item weights is used as the lower boundary of the interval, the maximum value is used as the upper boundary of the interval, and the weights are randomly generated within the interval as the combination weight.

8. A method for predicting deformation of tunnels through engineering based on machine learning algorithms, comprising the steps of:

Utilize the method described in any one in claim 1-7 to construct training set;

training a machine learning algorithm using the training set; and

Using machine learning algorithms to predict deformation through engineering tunnels.

9. The method of claim 8, wherein the machine learning algorithm is selected from ANN, CNN, XGboost, and lightGBM.

10. A deformation prediction system for crossing engineering tunnels based on machine learning algorithms, comprising: one or more processors; and a memory, the memory stores instructions executable by the one or more processors, and the instructions make The system performs a method according to any one of claims 8-9.