CN114461630B

CN114461630B - Smart attribution analysis method, device, equipment and storage medium

Info

Publication number: CN114461630B
Application number: CN202210134402.5A
Authority: CN
Inventors: 贺民
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-02-14
Filing date: 2022-02-14
Publication date: 2024-10-29
Anticipated expiration: 2042-02-14
Also published as: CN114461630A

Abstract

The invention relates to an artificial intelligence technology, and discloses an intelligent attribution analysis method, which comprises the following steps: acquiring an initial data set and carrying out data replacement processing on the abnormal value to obtain a standard data set; dividing the standard data set into attribution phenomenon data sets and corresponding attribution factor data sets; importing the attribution phenomenon data set and the attribution factor data set into a preset model library, and calculating the prediction success rate of each model in the model library; determining an optimal attribution phenomenon prediction model according to the prediction success rate; and selecting a model interpretation algorithm according to the type of the optimal attribution phenomenon prediction model, and utilizing the contribution degree of each attribution factor data of the model interpretation algorithm. In addition, the invention also relates to a blockchain technology, and the contribution degree of the initial data set and each attribution factor can be stored in nodes of the blockchain. The invention further provides an intelligent attribution analysis device, electronic equipment and a storage medium. The invention can improve the accuracy of attribution analysis.

Description

Intelligent attribution analysis method, device, equipment and storage medium

技术领域Technical Field

本发明涉及人工智能技术领域，尤其涉及一种智能归因分析方法、装置、电子设备及计算机可读存储介质。The present invention relates to the field of artificial intelligence technology, and in particular to an intelligent attribution analysis method, device, electronic device and computer-readable storage medium.

背景技术Background Art

归因分析是解释某个现象或效果由哪些因素构成的分析方法，在互联网广告行业、保险行业等各行业应用广泛，用于分析行业数据源自于怎样的用户行为，提高行业用户的黏性。Attribution analysis is an analytical method that explains which factors make up a phenomenon or effect. It is widely used in various industries such as the Internet advertising industry and the insurance industry. It is used to analyze the user behavior from which industry data comes and to improve the stickiness of industry users.

当前主要的归因算法有基于规则的归因以及基于数据驱动的归因算法两类，都需要人为预先设置好数学关系，但实际行业的业务场景复杂多变，缺少各行业业务场景通用的归因分析算法，导致现有归因分析方法准确性不高。The current main attribution algorithms include rule-based attribution and data-driven attribution algorithms. Both require mathematical relationships to be set in advance. However, the business scenarios in actual industries are complex and changeable, and there is a lack of attribution analysis algorithms that are common to business scenarios in various industries, resulting in the low accuracy of existing attribution analysis methods.

发明内容Summary of the invention

本发明提供一种智能归因分析方法、装置及计算机可读存储介质，其主要目的在于解决进行归因分析时准确度较低的问题。The present invention provides an intelligent attribution analysis method, device and computer-readable storage medium, the main purpose of which is to solve the problem of low accuracy when performing attribution analysis.

为实现上述目的，本发明提供的一种智能归因分析方法，包括：To achieve the above object, the present invention provides an intelligent attribution analysis method, comprising:

获取初始数据集，对所述初始数据集中的异常值进行数据替换处理，得到标准数据集；Acquire an initial data set, perform data replacement processing on outliers in the initial data set, and obtain a standard data set;

利用预先构建的变量库将所述标准数据集分为被归因现象数据集以及与所述被归因现象数据集相对应的归因因子数据集；Using a pre-built variable library, the standard data set is divided into an attributed phenomenon data set and an attribution factor data set corresponding to the attributed phenomenon data set;

将所述被归因现象数据集及所述归因因子数据集导入预设的归因现象预测模型库中，计算所述归因现象预测模型库中各个归因现象预测模型的预测成功率；Importing the attributed phenomenon data set and the attribution factor data set into a preset attribution phenomenon prediction model library, and calculating the prediction success rate of each attribution phenomenon prediction model in the attribution phenomenon prediction model library;

根据所述模型预测成功率从所述归因现象预测模型库中确定最优归因现象预测模型；Determining the optimal attribution phenomenon prediction model from the attribution phenomenon prediction model library according to the model prediction success rate;

根据所述最优归因现象预测模型的类型选取相对应的模型解释算法，利用所述模型解释算法计算所述归因因子数据集中各个归因因子数据对所述被归因现象数据的贡献度。A corresponding model interpretation algorithm is selected according to the type of the optimal attribution phenomenon prediction model, and the model interpretation algorithm is used to calculate the contribution of each attribution factor data in the attribution factor data set to the attributed phenomenon data.

可选地，所述对所述初始数据集中的异常值进行数据替换处理，得到标准数据集，包括:Optionally, performing data replacement processing on the outliers in the initial data set to obtain a standard data set includes:

计算所述初始数据集中每个初始数据与所述初始数据的邻域数据之间的局部可达密度比值；Calculating a local reachable density ratio between each initial data in the initial data set and the neighborhood data of the initial data;

在所述局部密度比值小于或等于预设密度比值阈值时，确定所述初始数据为异常值；When the local density ratio is less than or equal to a preset density ratio threshold, determining that the initial data is an abnormal value;

利用预设的正确数据集对所述异常值进行数据替换处理，得到标准数据集。The abnormal values are replaced with a preset correct data set to obtain a standard data set.

可选地，所述利用预先构建的变量库将所述标准数据集分为被归因现象数据集以及与所述被归因现象数据集相对应的归因因子数据集，包括：Optionally, the method of using a pre-built variable library to divide the standard data set into an attributed phenomenon data set and an attribution factor data set corresponding to the attributed phenomenon data set includes:

将所述标准数据集中的变量数据与预先构建的变量库进行对比，将所述标准数据集中与所述变量库中一致的变量数据确定为被归因现象数据，以及将所述所述标准数据集中与所述变量库中不一致的变量数据确定为归因因子数据；Comparing the variable data in the standard data set with the pre-built variable library, determining the variable data in the standard data set that are consistent with those in the variable library as attributed phenomenon data, and determining the variable data in the standard data set that are inconsistent with those in the variable library as attribution factor data;

计算所述被归因现象数据与所述归因因子数据的关联度，将所述关联度大于预设阈值的归因因子数据确定为与所述被归因现象数据相对应的目标归因因子数据；Calculating the correlation between the attributed phenomenon data and the attribution factor data, and determining the attribution factor data with the correlation greater than a preset threshold as the target attribution factor data corresponding to the attributed phenomenon data;

汇集所述被归因现象数据及所述目标归因因子数据，得到被归因现象数据集以及与所述被归因现象数据集相对应的归因因子数据集。The attributed phenomenon data and the target attribution factor data are collected to obtain an attributed phenomenon data set and an attribution factor data set corresponding to the attributed phenomenon data set.

可选地，所述计算所述被归因现象数据与所述归因因子数据的关联度，包括：Optionally, the calculating the correlation between the attributed phenomenon data and the attribution factor data includes:

其中，r(X,Y)为所述关联度，X为所述被归因现象数据，Y为第Y个归因因子数据，Cov(X,Y)为所述被归因现象数据与所述归因因子数据之间的协方差，σ_x为所述被归因现象数据的标准差，σ_y为所述归因因子数据的标准差。Wherein, r(X,Y) is the correlation degree, X is the attributed phenomenon data, Y is the Yth attribution factor data, Cov(X,Y) is the covariance between the attributed phenomenon data and the attribution factor data, σ _x is the standard deviation of the attributed phenomenon data, and σ _y is the standard deviation of the attribution factor data.

可选地，所述计算所述待归因现象预测模型库中各个归因现象预测模型的预测成功率，包括：Optionally, the calculating the prediction success rate of each attribution phenomenon prediction model in the to-be-attributed phenomenon prediction model library includes:

将所述被归因数据集及所述归因因子数据集按照预设的比例划分为训练样本及测试样本；Dividing the attributed data set and the attribution factor data set into training samples and test samples according to a preset ratio;

根据所述训练样本对所述预设的归因现象预测模型库中的各个归因现象预测模型进行模型训练，得到多个初始预测模型；Performing model training on each attribution phenomenon prediction model in the preset attribution phenomenon prediction model library according to the training samples to obtain multiple initial prediction models;

利用每个所述初始预测模型对所述测试样本进行模型预测，得到每个所述初始测试模型的测试数据；Using each of the initial prediction models to perform model prediction on the test samples to obtain test data for each of the initial test models;

将所述每个所述初始测试模型的测试数据与所述测试样本的被归因现象数据进行差值计算；Calculate the difference between the test data of each of the initial test models and the attributed phenomenon data of the test sample;

将所述差值小于预设阈值的测试数据确定为正确预测数据，并计算每个所述初始测试模型的测试数据中所述正确预测数据的比例，得到所述各个归因现象预测模型的预测成功率。The test data whose difference is less than a preset threshold is determined as the correct prediction data, and the proportion of the correct prediction data in the test data of each of the initial test models is calculated to obtain the prediction success rate of each attribution phenomenon prediction model.

可选地，所述利用所述模型解释算法计算所述归因因子数据集中各个归因因子数据对所述被归因现象数据的贡献度，包括：Optionally, the using the model interpretation algorithm to calculate the contribution of each attribution factor data in the attribution factor data set to the attributed phenomenon data includes:

计算所述归因因子数据集中各个归因因子数据的标准差，并根据所述标准差确定所述各个归因因子数据的扰动范围；Calculating the standard deviation of each attribution factor data in the attribution factor data set, and determining the disturbance range of each attribution factor data according to the standard deviation;

根据所述扰动范围对所述各个归因因子数据进行数据扰动，得到所述各个归因因子的新数据；Performing data perturbation on the data of each attribution factor according to the perturbation range to obtain new data of each attribution factor;

基于所述模型解释算法采用所述各个归因因子的新数据训练得到目标线性回归模型，得到所述各个归因因子数据所对应的权重；Based on the model interpretation algorithm, the new data of each attribution factor is used to train a target linear regression model, and the weights corresponding to the data of each attribution factor are obtained;

将所述各个归因因子数据与所述相对应的权重相乘，得到所述各个归因因子数据的贡献度。The respective attribution factor data are multiplied by the corresponding weight to obtain the contribution degree of the respective attribution factor data.

可选地，所述基于所述模型解释算法采用所述各个归因因子的新数据训练得到目标线性回归模型，包括：Optionally, the step of training a target linear regression model using the new data of each attribution factor based on the model interpretation algorithm includes:

分别计算所述各个归因因子数据与所述各个归因因子的新数据之间的距离，并将所述距离值作为所述各个归因因子的新数据的权重；Calculating the distance between each attribution factor data and the new data of each attribution factor respectively, and using the distance value as the weight of the new data of each attribution factor;

利用所述最优归因现象模型对所述各个归因因子的新数据进行被归因现象预测，得到被归因现象预测数据，并将所述被归因现象预测数据作为所述各个归因因子的新数据对应的标签数据；Using the optimal attribution phenomenon model to predict the attributed phenomenon for the new data of each attribution factor, to obtain the attributed phenomenon prediction data, and using the attributed phenomenon prediction data as the label data corresponding to the new data of each attribution factor;

基于预设模型解释算法采用所述标签数据和带权重的所述各个归因因子的新数据训练得到目标线性回归模型。Based on a preset model interpretation algorithm, the label data and new data of each attribution factor with weights are used to train a target linear regression model.

为了解决上述问题，本发明还提供一种智能归因分析装置，所述装置包括：In order to solve the above problems, the present invention further provides an intelligent attribution analysis device, the device comprising:

标准数据集获取模块，用于获取初始数据集，对所述初始数据集中的异常值进行数据替换处理，得到标准数据集；A standard data set acquisition module is used to acquire an initial data set, perform data replacement processing on abnormal values in the initial data set, and obtain a standard data set;

标准数据集划分模块，用于利用预先构建的变量库将所述标准数据集分为被归因现象数据集以及与所述被归因现象数据集相对应的归因因子数据集；A standard data set division module, used for dividing the standard data set into an attributed phenomenon data set and an attribution factor data set corresponding to the attributed phenomenon data set by using a pre-built variable library;

模型预测成功率计算模块，用于将所述被归因现象数据集及所述归因因子数据集导入预设的归因现象预测模型库中，计算所述归因现象预测模型库中各个归因现象预测模型的预测成功率；A model prediction success rate calculation module, used to import the attributed phenomenon data set and the attribution factor data set into a preset attribution phenomenon prediction model library, and calculate the prediction success rate of each attribution phenomenon prediction model in the attribution phenomenon prediction model library;

最优归因现象预测模型确定模块，用于根据所述模型预测成功率从所述归因现象预测模型库中确定最优归因现象预测模型；An optimal attribution phenomenon prediction model determination module, used to determine the optimal attribution phenomenon prediction model from the attribution phenomenon prediction model library according to the model prediction success rate;

归因因子数据贡献度计算模块，用于根据所述最优归因现象预测模型的类型选取相对应的模型解释算法，利用所述模型解释算法计算所述归因因子数据集中各个归因因子数据对所述被归因现象数据的贡献度。The attribution factor data contribution calculation module is used to select the corresponding model interpretation algorithm according to the type of the optimal attribution phenomenon prediction model, and use the model interpretation algorithm to calculate the contribution of each attribution factor data in the attribution factor data set to the attributed phenomenon data.

为了解决上述问题，本发明还提供一种电子设备，所述电子设备包括：In order to solve the above problem, the present invention further provides an electronic device, the electronic device comprising:

至少一个处理器；以及，at least one processor; and,

与所述至少一个处理器通信连接的存储器；其中，a memory communicatively connected to the at least one processor; wherein,

所述存储器存储有可被所述至少一个处理器执行的计算机程序，所述计算机程序被所述至少一个处理器执行，以使所述至少一个处理器能够执行上述所述的智能归因分析方法。The memory stores a computer program that can be executed by the at least one processor, and the computer program is executed by the at least one processor so that the at least one processor can perform the intelligent attribution analysis method described above.

为了解决上述问题，本发明还提供一种计算机可读存储介质，所述计算机可读存储介质中存储有至少一个计算机程序，所述至少一个计算机程序被电子设备中的处理器执行以实现上述所述的智能归因分析方法。In order to solve the above problem, the present invention also provides a computer-readable storage medium, in which at least one computer program is stored. The at least one computer program is executed by a processor in an electronic device to implement the above-mentioned intelligent attribution analysis method.

本发明实施例通过预先构建的变量库将标准数据集分为被归因现象数据集及对应的归因因子数据集，根据不同的行业变量将数据划分，有利于提高数据的准确性；将被归因现象数据集及对应的归因因子数据集导入预设的归因现象预测模型库中，计算各个归因现象预测模型的预测成功率；再根据预测成功率从归因现象预测模型库中确定最优归因现象预测模型，根据不同的数据选取与数据最符合的最优归因现象预测模型，进一步地提高归因分析的准确度；根据最优归因现象预测模型的类型选取对应的模型解释算法，利用模型解释算法计算计算所述归因因子数据集中各个归因因子数据对所述被归因现象数据的贡献度，得到更加准确的归因分析结果。因此本发明提出的智能归因分析方法、装置、电子设备及计算机可读存储介质，可以解决进行归因分析时的准确度较低的问题。The embodiment of the present invention divides the standard data set into the attributed phenomenon data set and the corresponding attribution factor data set through the pre-constructed variable library, and divides the data according to different industry variables, which is conducive to improving the accuracy of the data; the attributed phenomenon data set and the corresponding attribution factor data set are imported into the preset attribution phenomenon prediction model library, and the prediction success rate of each attribution phenomenon prediction model is calculated; then the optimal attribution phenomenon prediction model is determined from the attribution phenomenon prediction model library according to the prediction success rate, and the optimal attribution phenomenon prediction model that best matches the data is selected according to different data, so as to further improve the accuracy of the attribution analysis; the corresponding model interpretation algorithm is selected according to the type of the optimal attribution phenomenon prediction model, and the contribution of each attribution factor data in the attribution factor data set to the attributed phenomenon data is calculated by using the model interpretation algorithm, so as to obtain a more accurate attribution analysis result. Therefore, the intelligent attribution analysis method, device, electronic device and computer-readable storage medium proposed by the present invention can solve the problem of low accuracy when performing attribution analysis.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明一实施例提供的智能归因分析方法的流程示意图；FIG1 is a schematic diagram of a flow chart of an intelligent attribution analysis method provided by an embodiment of the present invention;

图2为本发明一实施例提供的标准数据集划分的流程示意图；FIG2 is a schematic diagram of a process of dividing a standard data set provided by an embodiment of the present invention;

图3为本发明一实施例提供的计算模型预测成功率的流程示意图；FIG3 is a schematic diagram of a flow chart of a calculation model predicting success rate according to an embodiment of the present invention;

图4为本发明一实施例提供的计算归因因子数据贡献度的流程示意图；FIG4 is a schematic diagram of a process for calculating the contribution of attribution factor data according to an embodiment of the present invention;

图5为本发明一实施例提供的智能归因分析装置的功能模块图；FIG5 is a functional module diagram of an intelligent attribution analysis device provided by an embodiment of the present invention;

图6为本发明一实施例提供的实现所述智能归因分析方法的电子设备的结构示意图。FIG6 is a schematic diagram of the structure of an electronic device for implementing the intelligent attribution analysis method provided by an embodiment of the present invention.

本发明目的的实现、功能特点及优点将结合实施例，参照附图做进一步说明。The realization of the purpose, functional features and advantages of the present invention will be further explained in conjunction with embodiments and with reference to the accompanying drawings.

具体实施方式DETAILED DESCRIPTION

应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。It should be understood that the specific embodiments described herein are only used to explain the present invention, and are not used to limit the present invention.

本申请实施例提供一种智能归因分析方法。所述智能归因分析方法的执行主体包括但不限于服务端、终端等能够被配置为执行本申请实施例提供的该方法的电子设备中的至少一种。换言之，所述智能归因分析方法可以由安装在终端设备或服务端设备的软件或硬件来执行，所述软件可以是区块链平台。所述服务端包括但不限于：单台服务器、服务器集群、云端服务器或云端服务器集群等。所述服务器可以是独立的服务器，也可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(ContentDelivery Network，CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。The embodiment of the present application provides an intelligent attribution analysis method. The execution subject of the intelligent attribution analysis method includes but is not limited to at least one of the electronic devices such as a server, a terminal, etc. that can be configured to execute the method provided by the embodiment of the present application. In other words, the intelligent attribution analysis method can be executed by software or hardware installed on a terminal device or a server device, and the software can be a blockchain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, etc. The server can be an independent server, or it can be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.

参照图1所示，为本发明一实施例提供的智能归因分析方法的流程示意图。在本实施例中，所述智能归因分析方法包括：1 is a flow chart of an intelligent attribution analysis method provided by an embodiment of the present invention. In this embodiment, the intelligent attribution analysis method includes:

S1、获取初始数据集，对所述初始数据集中的异常值进行数据替换处理，得到标准数据集；S1. Obtain an initial data set, perform data replacement processing on outliers in the initial data set, and obtain a standard data set;

本发明实施例中，所述初始数据集为不同行业的不同业务数据集，例如，互联网广告行业的相关数据、保险行业的相关数据，智能客服场景下的相关数据等。In an embodiment of the present invention, the initial data set is different business data sets of different industries, for example, relevant data of the Internet advertising industry, relevant data of the insurance industry, relevant data in an intelligent customer service scenario, etc.

详细地，所述对所述初始数据集中的异常值进行数据替换处理，得到标准数据集，包括:In detail, the data replacement process is performed on the outliers in the initial data set to obtain a standard data set, including:

本发明实施例中，所述局部可达密度为每个初始数据的邻域数据到所述初始数据的平均距离的倒数，若所述局部可达密度比值大于预设密度比值阈值时，则可认为所述邻近数据与所述初始数据为同一簇，不是异常值。In an embodiment of the present invention, the local reachable density is the inverse of the average distance from the neighborhood data of each initial data to the initial data. If the local reachable density ratio is greater than a preset density ratio threshold, it can be considered that the neighboring data and the initial data are in the same cluster and are not outliers.

详细地，本发明实施例中，利用如下公式计算所述初始数据集中每个初始数据与邻近数据的局部可达密度比值：Specifically, in the embodiment of the present invention, the local reachable density ratio of each initial data to the adjacent data in the initial data set is calculated using the following formula:

其中，LOF_k(P)为局部可达密度比值，N_k(P)为所述初始数据集的第P个初始数据，ρ_k(P)为所述第P个数据的局部可达密度，ρ_k(O)为所述邻域数据O的平均局部可达密度，d_k(P,O)为第P个初始数据到邻域数据O之间的距离。Wherein, LOF _k (P) is the local reachable density ratio, N _k (P) is the Pth initial data of the initial data set, ρ _k (P) is the local reachable density of the Pth data, ρ _k (O) is the average local reachable density of the neighborhood data O, and d _k (P, O) is the distance between the Pth initial data and the neighborhood data O.

本发明实施例中，通过将所述所述初始数据集中的异常值进行数据替换处理，剔除初始数据集中的异常数据，保证初始数据集中数据的合理性，提高后续模型选择的准确度。In the embodiment of the present invention, data replacement processing is performed on the abnormal values in the initial data set to eliminate the abnormal data in the initial data set, thereby ensuring the rationality of the data in the initial data set and improving the accuracy of subsequent model selection.

S2、利用预先构建的变量库将所述标准数据集分为被归因现象数据集以及与所述被归因现象数据集相对应的归因因子数据集；S2, using a pre-built variable library to divide the standard data set into an attributed phenomenon data set and an attribution factor data set corresponding to the attributed phenomenon data set;

本发明实施例中，所述被归因现象数据集中为不可定量的元素，所述归因因子数据集为影响所述被归因现象数据的多个可能因素，所述预先构建的变量库中包含多个提前确定的被归因现象数据。In the embodiment of the present invention, the attributed phenomenon data set includes non-quantifiable elements, the attribution factor data set includes multiple possible factors affecting the attributed phenomenon data, and the pre-built variable library includes multiple attributed phenomenon data determined in advance.

详细地，参阅图2所示，所述利用预先构建的变量库将所述标准数据集分为被归因现象数据集以及与所述被归因现象数据集相对应的归因因子数据集，包括：In detail, referring to FIG. 2 , the method of using a pre-built variable library to divide the standard data set into an attributed phenomenon data set and an attribution factor data set corresponding to the attributed phenomenon data set includes:

S21、将所述标准数据集中的变量数据与预先构建的变量库进行对比，将所述标准数据集中与所述变量库中一致的变量数据确定为被归因现象数据，以及将所述所述标准数据集中与所述变量库中不一致的变量数据确定为归因因子数据；S21, comparing the variable data in the standard data set with the pre-built variable library, determining the variable data in the standard data set that are consistent with those in the variable library as attributed phenomenon data, and determining the variable data in the standard data set that are inconsistent with those in the variable library as attribution factor data;

S22、计算所述被归因现象数据与所述归因因子数据的关联度，将所述关联度大于预设阈值的归因因子数据确定为与所述被归因现象数据相对应的目标归因因子数据；S22, calculating the correlation between the attributed phenomenon data and the attribution factor data, and determining the attribution factor data with the correlation greater than a preset threshold as the target attribution factor data corresponding to the attributed phenomenon data;

S23、汇集所述被归因现象数据及所述目标归因因子数据，得到被归因现象数据集以及与所述被归因现象数据集相对应的归因因子数据集。S23. Collect the attributed phenomenon data and the target attribution factor data to obtain an attributed phenomenon data set and an attribution factor data set corresponding to the attributed phenomenon data set.

进一步地，本发明实施例中，所述计算所述被归因现象数据与所述归因因子数据的关联度，包括：Furthermore, in an embodiment of the present invention, the calculating of the correlation between the attributed phenomenon data and the attribution factor data includes:

具体地，在不同的行业数据中被归因变量也不同，因此要根据实施例的具体行业数据选择对应的预先构建的变量库，根据所述预先构建的变量库将所述标准数据集分为被归因现象数据集以及归因因子数据集。Specifically, the attributed variables are different in different industry data, so the corresponding pre-built variable library should be selected according to the specific industry data of the embodiment, and the standard data set is divided into an attributed phenomenon data set and an attribution factor data set according to the pre-built variable library.

例如，本发明实施例中，在广告行业中所述被归因变量可以是广告收入，所述对应的目标归因因子则可以是广告点击率、用户浏览时间等，保险行业中所述被归因变量可以是续保率，则所对应的目标归因因子可以是客户平均在保年龄、客户年龄构成、各投保渠道的续保率等。For example, in an embodiment of the present invention, in the advertising industry, the attributed variable may be advertising revenue, and the corresponding target attribution factor may be advertising click-through rate, user browsing time, etc.; in the insurance industry, the attributed variable may be renewal rate, and the corresponding target attribution factor may be the average insured age of customers, customer age composition, renewal rate of each insurance channel, etc.

S3、将所述被归因现象数据集及所述归因因子数据集导入预设的归因现象预测模型库中，计算所述归因现象预测模型库中各个归因现象预测模型的预测成功率；S3, importing the attributed phenomenon data set and the attribution factor data set into a preset attribution phenomenon prediction model library, and calculating the prediction success rate of each attribution phenomenon prediction model in the attribution phenomenon prediction model library;

本发明实施例中，所述预设的归因现象预测模型库可以包含上百个具有预测功能的预测模型，包括但不限于时间衰减归因模型、贝叶斯模型、XGBoost、线性回归模型、全连接神经网络等具有预测功能的模型。In an embodiment of the present invention, the preset attribution phenomenon prediction model library may include hundreds of prediction models with prediction functions, including but not limited to time decay attribution model, Bayesian model, XGBoost, linear regression model, fully connected neural network and other models with prediction functions.

详细地，参阅图3所示，所述计算所述待归因现象预测模型库中各个归因现象预测模型的预测成功率，包括：In detail, referring to FIG. 3 , the calculation of the prediction success rate of each attribution phenomenon prediction model in the to-be-attributed phenomenon prediction model library includes:

S31、将所述被归因数据集及所述归因因子数据集按照预设的比例划分为训练样本及测试样本；S31, dividing the attributed data set and the attribution factor data set into training samples and test samples according to a preset ratio;

S32、根据所述训练样本对所述预设的归因现象预测模型库中的各个归因现象预测模型进行模型训练，得到多个初始预测模型；S32, performing model training on each attribution phenomenon prediction model in the preset attribution phenomenon prediction model library according to the training samples to obtain multiple initial prediction models;

S33、利用每个所述初始预测模型对所述测试样本进行模型预测，得到每个所述初始测试模型的测试数据；S33, using each of the initial prediction models to perform model prediction on the test sample to obtain test data for each of the initial test models;

S34、将所述每个所述初始测试模型的测试数据与所述测试样本的被归因现象数据进行差值计算；S34, performing difference calculation between the test data of each of the initial test models and the attributed phenomenon data of the test sample;

S35、将所述差值小于预设阈值的测试数据确定为正确预测数据，并计算每个所述初始测试模型的测试数据中所述正确预测数据的比例，得到所述各个归因现象预测模型的预测成功率。S35. Determine the test data whose difference is less than a preset threshold as the correct prediction data, and calculate the proportion of the correct prediction data in the test data of each of the initial test models to obtain the prediction success rate of each attribution phenomenon prediction model.

具体地，本发明实施例中，所述预设阈值可以根据不同的被归因现象数据集设置为不同的阈值，例如，若所述被归因现象数据为续保率，则所述阈值可以为0.01。Specifically, in an embodiment of the present invention, the preset threshold may be set to different thresholds according to different attributed phenomenon data sets. For example, if the attributed phenomenon data is a renewal rate, the threshold may be 0.01.

S4、根据所述模型预测成功率从所述归因现象预测模型库中确定最优归因现象预测模型；S4, determining the optimal attribution phenomenon prediction model from the attribution phenomenon prediction model library according to the model prediction success rate;

本发明实施例中，根据所述模型成功率在所述预设的归因现象预测模型库中选取与所述标准数据集最匹配的归因现象预测模型，即最优归因现象预测模型。In the embodiment of the present invention, the attribution phenomenon prediction model that best matches the standard data set is selected from the preset attribution phenomenon prediction model library according to the model success rate, that is, the optimal attribution phenomenon prediction model.

例如，本发明实际一应用场景中，所述预设的归因现象预测模型库中包含时间衰减归因模型、贝叶斯模型、XGBoost、线性模型、全连接神经网络等，其中时间衰减归因模型的预测成功率为89％，贝叶斯模型的预测成功率为85％，XGBoost的预测成功率为89％，线性模型的预测成功率为78％，全连接神经网络的预测成功率为94％，则确定全连接神经网络为最优归因现象预测模型。For example, in an actual application scenario of the present invention, the preset attribution phenomenon prediction model library includes a time decay attribution model, a Bayesian model, XGBoost, a linear model, a fully connected neural network, etc., wherein the prediction success rate of the time decay attribution model is 89%, the prediction success rate of the Bayesian model is 85%, the prediction success rate of XGBoost is 89%, the prediction success rate of the linear model is 78%, and the prediction success rate of the fully connected neural network is 94%. The fully connected neural network is determined to be the optimal attribution phenomenon prediction model.

本发明实施例中，通过所述模型预测成功率确定最优归因现象预测模型，能够根据不同的行业数据选取不同的被归因现象预测模型，提高不同行业数据的适用范围，同时保证后续归因因子数据贡献度计算的准确性。In an embodiment of the present invention, the optimal attribution phenomenon prediction model is determined by the model prediction success rate, and different attribution phenomenon prediction models can be selected according to different industry data, thereby improving the applicability of different industry data and ensuring the accuracy of subsequent attribution factor data contribution calculation.

S5、根据所述最优归因现象预测模型的类型选取相对应的模型解释算法，利用所述模型解释算法计算所述归因因子数据集中各个归因因子数据对所述被归因现象数据的贡献度。S5. Select a corresponding model interpretation algorithm according to the type of the optimal attribution phenomenon prediction model, and use the model interpretation algorithm to calculate the contribution of each attribution factor data in the attribution factor data set to the attributed phenomenon data.

本发明实施例中，所述相对应的模型解释算法是对所述最优归因现象预测模型中的变量即归因因子进行贡献度预估，即各个所述归因因子数据对所述最优归因现象预测模型的预测结果所起的作用大小。In an embodiment of the present invention, the corresponding model interpretation algorithm estimates the contribution of the variables, namely the attribution factors, in the optimal attribution phenomenon prediction model, that is, the role of each attribution factor data in the prediction result of the optimal attribution phenomenon prediction model.

具体地，本发明实施例中可以根据所述最优归因现象预测模型的类型在预先存储的模型解释算法库中调用与所述最优归因现象预测模型相对应的模型解释算法。Specifically, in the embodiments of the present invention, a model interpretation algorithm corresponding to the optimal attribution phenomenon prediction model can be called from a pre-stored model interpretation algorithm library according to the type of the optimal attribution phenomenon prediction model.

详细地，参阅图4所示，所述利用所述模型解释算法计算所述归因因子数据集中各个归因因子数据对所述被归因现象数据的贡献度，包括：In detail, referring to FIG. 4 , the step of calculating the contribution of each attribution factor data in the attribution factor data set to the attributed phenomenon data by using the model interpretation algorithm includes:

S51、计算所述归因因子数据集中各个归因因子数据的标准差，并根据所述标准差确定所述各个归因因子数据的扰动范围；S51, calculating the standard deviation of each attribution factor data in the attribution factor data set, and determining the disturbance range of each attribution factor data according to the standard deviation;

S52、根据所述扰动范围对所述各个归因因子数据进行数据扰动，得到所述各个归因因子的新数据；S52, performing data perturbation on the data of each attribution factor according to the perturbation range to obtain new data of each attribution factor;

S53、基于所述模型解释算法采用所述各个归因因子的新数据训练得到目标线性回归模型，得到所述各个归因因子数据所对应的权重；S53, based on the model interpretation algorithm, using the new data of each attribution factor to train a target linear regression model, and obtaining weights corresponding to the data of each attribution factor;

S54、将所述各个归因因子数据与所述相对应的权重相乘，得到所述各个归因因子数据的贡献度。S54: multiply each attribution factor data by the corresponding weight to obtain the contribution of each attribution factor data.

进一步地，本发明实施例中，所述基于所述模型解释算法采用所述各个归因因子的新数据训练得到目标线性回归模型，包括：Furthermore, in an embodiment of the present invention, the target linear regression model is obtained by training the new data of each attribution factor based on the model interpretation algorithm, including:

例如，本发明实际一应用场景中，在所述最优归因现象预测模型为全连接神经网络时，可调用Deeplift算法对所述最优归因现象预测模型进行解释，在所述最优归因现象预测模型为XGBoost模型时，可以调用Shapley Value(沙普利值)对XGBoost模型进行解释，得到所述最优归因现象预测模型中各个归因因子的贡献度。For example, in an actual application scenario of the present invention, when the optimal attribution phenomenon prediction model is a fully connected neural network, the Deeplift algorithm can be called to interpret the optimal attribution phenomenon prediction model; when the optimal attribution phenomenon prediction model is an XGBoost model, the Shapley Value can be called to interpret the XGBoost model to obtain the contribution of each attribution factor in the optimal attribution phenomenon prediction model.

本发明实施例中，通过模型解释算法得到各个归因因子数据对被归因现象数据的贡献度，找到行业数据主要源自于怎样的用户行为，根据所述各个归因因子的贡献度做出相对应的应对策略，例如，在保险行业数据中，发现进行续保率下降，利用所述智能归因分析方法计算得到主要是由于某保险渠道的老客户投诉率上升所导致。In an embodiment of the present invention, the contribution of each attribution factor data to the attributed phenomenon data is obtained through a model interpretation algorithm, and it is found out what kind of user behavior the industry data mainly comes from. According to the contribution of each attribution factor, a corresponding response strategy is made. For example, in the insurance industry data, it is found that the renewal rate is declining. The intelligent attribution analysis method is used to calculate that it is mainly caused by the increase in the complaint rate of old customers of a certain insurance channel.

本发明实施例通过预先构建的变量库将标准数据集分为被归因现象数据集及对应的归因因子数据集，根据不同的行业变量将数据划分，有利于提高数据的准确性；将被归因现象数据集及对应的归因因子数据集导入预设的归因现象预测模型库中，计算各个归因现象预测模型的预测成功率；再根据预测成功率从归因现象预测模型库中确定最优归因现象预测模型，根据不同的数据选取与数据最符合的最优归因现象预测模型，进一步地提高归因分析的准确度；根据最优归因现象预测模型的类型选取对应的模型解释算法，利用模型解释算法计算计算所述归因因子数据集中各个归因因子数据对所述被归因现象数据的贡献度，得到更加准确的归因分析结果。因此本发明提出的智能归因分析方法，可以解决进行归因分析时的准确度较低的问题。The embodiment of the present invention divides the standard data set into the attributed phenomenon data set and the corresponding attribution factor data set through the pre-constructed variable library, and divides the data according to different industry variables, which is conducive to improving the accuracy of the data; the attributed phenomenon data set and the corresponding attribution factor data set are imported into the preset attribution phenomenon prediction model library, and the prediction success rate of each attribution phenomenon prediction model is calculated; then the optimal attribution phenomenon prediction model is determined from the attribution phenomenon prediction model library according to the prediction success rate, and the optimal attribution phenomenon prediction model that best matches the data is selected according to different data, so as to further improve the accuracy of the attribution analysis; the corresponding model interpretation algorithm is selected according to the type of the optimal attribution phenomenon prediction model, and the contribution of each attribution factor data in the attribution factor data set to the attributed phenomenon data is calculated by using the model interpretation algorithm, so as to obtain a more accurate attribution analysis result. Therefore, the intelligent attribution analysis method proposed by the present invention can solve the problem of low accuracy when performing attribution analysis.

如图5所示，是本发明一实施例提供的智能归因分析装置的功能模块图。As shown in FIG5 , it is a functional module diagram of an intelligent attribution analysis device provided in one embodiment of the present invention.

本发明所述智能归因分析装置100可以安装于电子设备中。根据实现的功能，所述智能归因分析装置100可以包括标准数据集获取模块101、标准数据集划分模块102、模型预测成功率计算模块103、最优归因现象预测模型确定模块104及归因因子数据贡献度计算模块105。本发明所述模块也可以称之为单元，是指一种能够被电子设备处理器所执行，并且能够完成固定功能的一系列计算机程序段，其存储在电子设备的存储器中。The intelligent attribution analysis device 100 of the present invention can be installed in an electronic device. According to the functions to be implemented, the intelligent attribution analysis device 100 can include a standard data set acquisition module 101, a standard data set division module 102, a model prediction success rate calculation module 103, an optimal attribution phenomenon prediction model determination module 104 and an attribution factor data contribution calculation module 105. The module of the present invention can also be referred to as a unit, which refers to a series of computer program segments that can be executed by an electronic device processor and can complete fixed functions, which are stored in the memory of the electronic device.

在本实施例中，关于各模块/单元的功能如下：In this embodiment, the functions of each module/unit are as follows:

所述标准数据集获取模块101，用于获取初始数据集，对所述初始数据集中的异常值进行数据替换处理，得到标准数据集；The standard data set acquisition module 101 is used to acquire an initial data set, perform data replacement processing on abnormal values in the initial data set, and obtain a standard data set;

所述标准数据集划分模块102，用于利用预先构建的变量库将所述标准数据集分为被归因现象数据集以及与所述被归因现象数据集相对应的归因因子数据集；The standard data set division module 102 is used to divide the standard data set into an attributed phenomenon data set and an attribution factor data set corresponding to the attributed phenomenon data set by using a pre-built variable library;

所述模型预测成功率计算模块103，用于将所述被归因现象数据集及所述归因因子数据集导入预设的归因现象预测模型库中，计算所述归因现象预测模型库中各个归因现象预测模型的预测成功率；The model prediction success rate calculation module 103 is used to import the attributed phenomenon data set and the attribution factor data set into a preset attribution phenomenon prediction model library, and calculate the prediction success rate of each attribution phenomenon prediction model in the attribution phenomenon prediction model library;

所述最优归因现象预测模型确定模块104，用于根据所述模型预测成功率从所述归因现象预测模型库中确定最优归因现象预测模型；The optimal attribution phenomenon prediction model determination module 104 is used to determine the optimal attribution phenomenon prediction model from the attribution phenomenon prediction model library according to the model prediction success rate;

所述归因因子数据贡献度计算模块105，用于根据所述最优归因现象预测模型的类型选取相对应的模型解释算法，利用所述模型解释算法计算所述归因因子数据集中各个归因因子数据对所述被归因现象数据的贡献度。The attribution factor data contribution calculation module 105 is used to select a corresponding model interpretation algorithm according to the type of the optimal attribution phenomenon prediction model, and use the model interpretation algorithm to calculate the contribution of each attribution factor data in the attribution factor data set to the attributed phenomenon data.

详细地，本发明实施例中所述智能归因分析装置100中所述的各模块在使用时采用与上述图1至图4中所述的智能归因分析方法一样的技术手段，并能够产生相同的技术效果，这里不再赘述。In detail, each module described in the intelligent attribution analysis device 100 described in the embodiment of the present invention adopts the same technical means as the intelligent attribution analysis method described in Figures 1 to 4 above when used, and can produce the same technical effects, which will not be repeated here.

如图6所示，是本发明一实施例提供的实现智能归因分析方法的电子设备的结构示意图。As shown in FIG6 , it is a schematic diagram of the structure of an electronic device for implementing an intelligent attribution analysis method provided by an embodiment of the present invention.

所述电子设备1可以包括处理器10、存储器11、通信总线12以及通信接口13，还可以包括存储在所述存储器11中并可在所述处理器10上运行的计算机程序，如智能归因分析程序。The electronic device 1 may include a processor 10, a memory 11, a communication bus 12, and a communication interface 13, and may also include a computer program stored in the memory 11 and executable on the processor 10, such as an intelligent attribution analysis program.

其中，所述处理器10在一些实施例中可以由集成电路组成，例如可以由单个封装的集成电路所组成，也可以是由多个相同功能或不同功能封装的集成电路所组成，包括一个或者多个中央处理器(Central Processing unit，CPU)、微处理器、数字处理芯片、图形处理器及各种控制芯片的组合等。所述处理器10是所述电子设备的控制核心(ControlUnit)，利用各种接口和线路连接整个电子设备的各个部件，通过运行或执行存储在所述存储器11内的程序或者模块(例如执行智能归因分析程序等)，以及调用存储在所述存储器11内的数据，以执行电子设备的各种功能和处理数据。In some embodiments, the processor 10 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or a plurality of packaged integrated circuits with the same or different functions, including one or more central processing units (CPUs), microprocessors, digital processing chips, graphics processors, and combinations of various control chips. The processor 10 is the control core (ControlUnit) of the electronic device, and uses various interfaces and lines to connect various components of the entire electronic device, and executes or executes programs or modules stored in the memory 11 (for example, executing intelligent attribution analysis programs, etc.), and calls data stored in the memory 11 to execute various functions of the electronic device and process data.

所述存储器11至少包括一种类型的可读存储介质，所述可读存储介质包括闪存、移动硬盘、多媒体卡、卡型存储器(例如：SD或DX存储器等)、磁性存储器、磁盘、光盘等。所述存储器11在一些实施例中可以是电子设备的内部存储单元，例如该电子设备的移动硬盘。所述存储器11在另一些实施例中也可以是电子设备的外部存储设备，例如电子设备上配备的插接式移动硬盘、智能存储卡(Smart Media Card，SMC)、安全数字(Secure Digital，SD)卡、闪存卡(Flash Card)等。进一步地，所述存储器11还可以既包括电子设备的内部存储单元也包括外部存储设备。所述存储器11不仅可以用于存储安装于电子设备的应用软件及各类数据，例如智能归因分析程序的代码等，还可以用于暂时地存储已经输出或者将要输出的数据。The memory 11 includes at least one type of readable storage medium, and the readable storage medium includes a flash memory, a mobile hard disk, a multimedia card, a card-type memory (for example, an SD or DX memory, etc.), a magnetic memory, a disk, an optical disk, etc. In some embodiments, the memory 11 may be an internal storage unit of an electronic device, such as a mobile hard disk of the electronic device. In other embodiments, the memory 11 may also be an external storage device of an electronic device, such as a plug-in mobile hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash card (Flash Card), etc. equipped on the electronic device. Furthermore, the memory 11 may also include both an internal storage unit of the electronic device and an external storage device. The memory 11 can be used not only to store application software and various types of data installed in the electronic device, such as the code of the intelligent attribution analysis program, but also to temporarily store data that has been output or is to be output.

所述通信总线12可以是外设部件互连标准(peripheral componentinterconnect，简称PCI)总线或扩展工业标准结构(extended industry standardarchitecture，简称EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。所述总线被设置为实现所述存储器11以及至少一个处理器10等之间的连接通信。The communication bus 12 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is configured to realize connection and communication between the memory 11 and at least one processor 10, etc.

所述通信接口13用于上述电子设备与其他设备之间的通信，包括网络接口和用户接口。可选地，所述网络接口可以包括有线接口和/或无线接口(如WI-FI接口、蓝牙接口等)，通常用于在该电子设备与其他电子设备之间建立通信连接。所述用户接口可以是显示器(Display)、输入单元(比如键盘(Keyboard))，可选地，用户接口还可以是标准的有线接口、无线接口。可选地，在一些实施例中，显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode，有机发光二极管)触摸器等。其中，显示器也可以适当的称为显示屏或显示单元，用于显示在电子设备中处理的信息以及用于显示可视化的用户界面。The communication interface 13 is used for communication between the above-mentioned electronic device and other devices, including a network interface and a user interface. Optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is generally used to establish a communication connection between the electronic device and other electronic devices. The user interface may be a display (Display), an input unit (such as a keyboard (Keyboard)), and optionally, the user interface may also be a standard wired interface, a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, and an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, etc. Among them, the display may also be appropriately referred to as a display screen or a display unit, which is used to display information processed in the electronic device and to display a visual user interface.

图中仅示出了具有部件的电子设备，本领域技术人员可以理解的是，图中示出的结构并不构成对所述电子设备的限定，可以包括比图示更少或者更多的部件，或者组合某些部件，或者不同的部件布置。The figure only shows an electronic device with components. Those skilled in the art will understand that the structure shown in the figure does not constitute a limitation on the electronic device, and may include fewer or more components than shown in the figure, or combine certain components, or arrange the components differently.

例如，尽管未示出，所述电子设备还可以包括给各个部件供电的电源(比如电池)，优选地，电源可以通过电源管理装置与所述至少一个处理器10逻辑相连，从而通过电源管理装置实现充电管理、放电管理、以及功耗管理等功能。电源还可以包括一个或一个以上的直流或交流电源、再充电装置、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。所述电子设备还可以包括多种传感器、蓝牙模块、Wi-Fi模块等，在此不再赘述。For example, although not shown, the electronic device may also include a power source (such as a battery) for supplying power to each component. Preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that the power management device can realize functions such as charging management, discharging management, and power consumption management. The power source may also include one or more DC or AC power sources, recharging devices, power failure detection circuits, power converters or inverters, power status indicators, and other arbitrary components. The electronic device may also include a variety of sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.

应该了解，所述实施例仅为说明之用，在专利申请范围上并不受此结构的限制。It should be understood that the embodiment is for illustration only and the scope of the patent application is not limited to this structure.

所述电子设备1中的所述存储器11存储的智能归因分析程序是多个指令的组合，在所述处理器10中运行时，可以实现：The intelligent attribution analysis program stored in the memory 11 of the electronic device 1 is a combination of multiple instructions. When running in the processor 10, it can achieve:

具体地，所述处理器10对上述指令的具体实现方法可参考附图对应实施例中相关步骤的描述，在此不赘述。Specifically, the specific implementation method of the processor 10 for the above instructions can refer to the description of the relevant steps in the corresponding embodiment of the accompanying drawings, which will not be repeated here.

进一步地，所述电子设备1集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读存储介质中。所述计算机可读存储介质可以是易失性的，也可以是非易失性的。例如，所述计算机可读介质可以包括：能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM，Read-Only Memory)。Furthermore, if the module/unit integrated in the electronic device 1 is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. The computer-readable storage medium can be volatile or non-volatile. For example, the computer-readable medium can include: any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a mobile hard disk, a magnetic disk, an optical disk, a computer memory, and a read-only memory (ROM).

本发明还提供一种计算机可读存储介质，所述可读存储介质存储有计算机程序，所述计算机程序在被电子设备的处理器所执行时，可以实现：The present invention further provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor of an electronic device, the computer program can implement:

在本发明所提供的几个实施例中，应该理解到，所揭露的设备，装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述模块的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式。In the several embodiments provided by the present invention, it should be understood that the disclosed devices, apparatuses and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative, for example, the division of the modules is only a logical function division, and there may be other division methods in actual implementation.

所述作为分离部件说明的模块可以是或者也可以不是物理上分开的，作为模块显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本发明各个实施例中的各功能模块可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用硬件加软件功能模块的形式实现。In addition, each functional module in each embodiment of the present invention may be integrated into one processing unit, each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of hardware plus software functional modules.

对于本领域技术人员而言，显然本发明不限于上述示范性实施例的细节，而且在不背离本发明的精神或基本特征的情况下，能够以其他的具体形式实现本发明。It is obvious to those skilled in the art that the present invention is not limited to the details of the above exemplary embodiments, and that the present invention can be implemented in other specific forms without departing from the spirit or essential characteristics of the present invention.

因此，无论从哪一点来看，均应将实施例看作是示范性的，而且是非限制性的，本发明的范围由所附权利要求而不是上述说明限定，因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本发明内。不应将权利要求中的任何附关联图标记视为限制所涉及的权利要求。Therefore, no matter from which point of view, the embodiments should be regarded as illustrative and non-restrictive, and the scope of the present invention is limited by the appended claims rather than the above description, so it is intended that all changes falling within the meaning and scope of the equivalent elements of the claims are included in the present invention. Any attached figure mark in the claims should not be regarded as limiting the claims involved.

本发明所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain)，本质上是一个去中心化的数据库，是一串使用密码学方法相关联产生的数据块，每一个数据块中包含了一批次网络交易的信息，用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this invention is a new application mode of computer technologies such as distributed data storage, peer-to-peer transmission, consensus mechanism, encryption algorithm, etc. Blockchain is essentially a decentralized database, a string of data blocks generated by cryptographic methods. Each data block contains a batch of network transaction information, which is used to verify the validity of its information (anti-counterfeiting) and generate the next block. Blockchain can include the underlying blockchain platform, platform product service layer, and application service layer.

本申请实施例可以基于人工智能技术对相关的数据进行获取和处理。其中，人工智能(Artificial Intelligence，AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能，感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。The embodiments of the present application can acquire and process relevant data based on artificial intelligence technology. Among them, artificial intelligence (AI) is the theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.

此外，显然“包括”一词不排除其他单元或步骤，单数不排除复数。系统权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第一、第二等词语用来表示名称，而并不表示任何特定的顺序。In addition, it is clear that the word "comprising" does not exclude other units or steps, and the singular does not exclude the plural. Multiple units or devices stated in the system claim can also be implemented by one unit or device through software or hardware. The words first, second, etc. are used to indicate names, and do not indicate any particular order.

最后应说明的是，以上实施例仅用以说明本发明的技术方案而非限制，尽管参照较佳实施例对本发明进行了详细说明，本领域的普通技术人员应当理解，可以对本发明的技术方案进行修改或等同替换，而不脱离本发明技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention rather than to limit it. Although the present invention has been described in detail with reference to the preferred embodiments, those skilled in the art should understand that the technical solution of the present invention can be modified or replaced by equivalents without departing from the spirit and scope of the technical solution of the present invention.

Claims

1. An intelligent attribution analysis method, characterized in that the method comprises:

Acquire an initial data set, perform data replacement processing on outliers in the initial data set, and obtain a standard data set, wherein the initial data set includes data from the Internet advertising industry, data from the insurance industry, and/or data in an intelligent customer service scenario;

Using a pre-built variable library, the standard data set is divided into an attributed phenomenon data set and an attribution factor data set corresponding to the attributed phenomenon data set;

Importing the attributed phenomenon data set and the attribution factor data set into a preset attribution phenomenon prediction model library, and calculating the prediction success rate of each attribution phenomenon prediction model in the attribution phenomenon prediction model library;

Determining the optimal attribution phenomenon prediction model from the attribution phenomenon prediction model library according to the model prediction success rate;

Selecting a corresponding model interpretation algorithm according to the type of the optimal attribution phenomenon prediction model, and using the model interpretation algorithm to calculate the contribution of each attribution factor data in the attribution factor data set to the attributed phenomenon data;

The method of using a pre-built variable library to divide the standard data set into an attributed phenomenon data set and an attribution factor data set corresponding to the attributed phenomenon data set includes: comparing the variable data in the standard data set with the pre-built variable library, determining the variable data in the standard data set that are consistent with those in the variable library as the attributed phenomenon data, and determining the variable data in the standard data set that are inconsistent with those in the variable library as the attribution factor data; calculating the correlation between the attributed phenomenon data and the attribution factor data, and determining the attribution factor data with the correlation greater than a preset threshold as the target attribution factor data corresponding to the attributed phenomenon data; and collecting the attributed phenomenon data and the target attribution factor data to obtain the attributed phenomenon data set and the attribution factor data set corresponding to the attributed phenomenon data set;

The using of the model interpretation algorithm to calculate the contribution of each attribution factor data in the attribution factor data set to the attributed phenomenon data includes: calculating the standard deviation of each attribution factor data in the attribution factor data set, and determining the disturbance range of each attribution factor data according to the standard deviation; performing data perturbation on each attribution factor data according to the disturbance range to obtain new data of each attribution factor; training a target linear regression model based on the model interpretation algorithm using the new data of each attribution factor to obtain the weight corresponding to each attribution factor data; multiplying each attribution factor data by the corresponding weight to obtain the contribution of each attribution factor data;

The method of obtaining a target linear regression model by training the new data of each attribution factor based on the model interpretation algorithm includes: respectively calculating the distance between the data of each attribution factor and the new data of each attribution factor, and using the distance as the weight of the new data of each attribution factor; using the optimal attribution phenomenon prediction model to predict the attributed phenomenon for the new data of each attribution factor to obtain the attributed phenomenon prediction data, and using the attributed phenomenon prediction data as the label data corresponding to the new data of each attribution factor; and obtaining a target linear regression model by training the label data and the new data of each attribution factor with weights based on the preset model interpretation algorithm.

2. The intelligent attribution analysis method according to claim 1, characterized in that the step of performing data replacement processing on the outliers in the initial data set to obtain a standard data set comprises:

Calculating a local reachable density ratio between each initial data in the initial data set and the neighborhood data of the initial data;

When the local reachable density ratio is less than or equal to a preset density ratio threshold, determining that the initial data is an abnormal value;

The abnormal values are replaced with a preset correct data set to obtain a standard data set.

3. The intelligent attribution analysis method according to claim 1, wherein the step of calculating the correlation between the attributed phenomenon data and the attribution factor data comprises:

in, is the correlation degree, is the attributed phenomenon data, For the Attribution factor data, is the covariance between the attributed phenomenon data and the attribution factor data, is the standard deviation of the attributed phenomenon data, is the standard deviation of the attribution factor data.

4. The intelligent attribution analysis method according to claim 1, wherein the step of calculating the prediction success rate of each attribution phenomenon prediction model in the attribution phenomenon prediction model library comprises:

Dividing the attributed phenomenon data set and the attribution factor data set into training samples and test samples according to a preset ratio;

Performing model training on each attribution phenomenon prediction model in the preset attribution phenomenon prediction model library according to the training samples to obtain multiple initial prediction models;

Using each of the initial prediction models to perform model prediction on the test sample to obtain test data for each of the initial prediction models;

Performing difference calculation between the test data of each of the initial prediction models and the attributed phenomenon data of the test sample;

The test data whose difference is less than a preset threshold is determined as the correct prediction data, and the proportion of the correct prediction data in the test data of each of the initial prediction models is calculated to obtain the prediction success rate of each attribution phenomenon prediction model.

5. An intelligent attribution analysis device, used to implement the intelligent attribution analysis method according to any one of claims 1 to 4, characterized in that the device comprises:

A standard data set acquisition module is used to acquire an initial data set, perform data replacement processing on abnormal values in the initial data set, and obtain a standard data set;

A standard data set division module, used for dividing the standard data set into an attributed phenomenon data set and an attribution factor data set corresponding to the attributed phenomenon data set by using a pre-built variable library;

A model prediction success rate calculation module, used to import the attributed phenomenon data set and the attribution factor data set into a preset attribution phenomenon prediction model library, and calculate the prediction success rate of each attribution phenomenon prediction model in the attribution phenomenon prediction model library;

An optimal attribution phenomenon prediction model determination module, used to determine the optimal attribution phenomenon prediction model from the attribution phenomenon prediction model library according to the model prediction success rate;

The attribution factor data contribution calculation module is used to select the corresponding model interpretation algorithm according to the type of the optimal attribution phenomenon prediction model, and use the model interpretation algorithm to calculate the contribution of each attribution factor data in the attribution factor data set to the attributed phenomenon data.

6. An electronic device, characterized in that the electronic device comprises:

at least one processor; and,

a memory communicatively connected to the at least one processor; wherein,

The memory stores a computer program that can be executed by the at least one processor, and the computer program is executed by the at least one processor so that the at least one processor can execute the intelligent attribution analysis method as described in any one of claims 1 to 4.

7. A computer-readable storage medium storing a computer program, characterized in that when the computer program is executed by a processor, the intelligent attribution analysis method as described in any one of claims 1 to 4 is implemented.