CN110765330A

CN110765330A - Data set classification method, apparatus, equipment and computer storage medium

Info

Publication number: CN110765330A
Application number: CN201911036146.0A
Authority: CN
Inventors: 林冰垠; 王跃; 刘玉德; 卓本刚
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2019-10-28
Filing date: 2019-10-28
Publication date: 2020-02-07

Abstract

The invention relates to the technical field of financial technology (Fintech), and discloses a data set classification method. The method includes: pre-dividing a data set to be classified into a training set, a test set and a verification set, and based on the training set, all The test set and the verification set establish a dynamic image; the training scale value corresponding to the training set is adjusted according to the adjustment instruction received by the dynamic image to obtain the target training scale value; based on the target training scale value and The preset constraint conditions respectively update the test scale value corresponding to the test set and the verification scale value corresponding to the verification set, and based on the target training scale value, the updated test scale value and the updated test scale value. The validation scale value classifies the dataset. The invention also discloses a data set classification device, equipment and a computer storage medium. The invention improves the processing efficiency of classifying the data set.

Description

Data set classification method, apparatus, equipment and computer storage medium

技术领域technical field

本发明涉及金融科技(Fintech)技术领域，尤其涉及数据集分类方法、装置、设备及计算机存储介质。The present invention relates to the technical field of financial technology (Fintech), and in particular, to a data set classification method, apparatus, device and computer storage medium.

背景技术Background technique

随着计算机技术的发展，越来越多的技术(大数据、分布式、区块链Blockchain、人工智能等)应用在金融领域，传统金融业正在逐步向金融科技(Fintech)转变，但由于金融行业的安全性、实时性要求，也对技术提出了更高的要求。例如，在机器学习过程中，时常需要将一个数据集分类为训练集、测试集和验证集来分别进行训练，其中训练集为必须的数据集，测试集和验证集可根据实际用途按需划分。目前常规方法是分别设定3个数据集占总数据集的数值比例，且3个比例之和须为100％，但是此方法在调节某个数据集占比时，须另外再手动调节其他两个数据集占比，操作十分不方便，且输入错误率高，导致数据集的处理效率低下。With the development of computer technology, more and more technologies (big data, distributed, blockchain, artificial intelligence, etc.) are applied in the financial field, and the traditional financial industry is gradually transforming into financial technology (Fintech). The security and real-time requirements of the industry also put forward higher requirements for technology. For example, in the process of machine learning, it is often necessary to classify a data set into a training set, a test set and a validation set for training separately, of which the training set is a necessary data set, and the test set and validation set can be divided according to actual purposes. . At present, the conventional method is to set the numerical ratio of 3 data sets to the total data set respectively, and the sum of the 3 ratios must be 100%. However, when adjusting the proportion of a certain data set in this method, it is necessary to manually adjust the other two The proportion of data sets is very inconvenient to operate, and the input error rate is high, resulting in low processing efficiency of data sets.

发明内容SUMMARY OF THE INVENTION

本发明的主要目的在于提出一种数据集分类方法、装置、设备及计算机存储介质，旨在提高对数据集的处理效率。The main purpose of the present invention is to provide a data set classification method, device, equipment and computer storage medium, aiming at improving the processing efficiency of the data set.

为实现上述目的，本发明提供一种数据集分类方法，所述数据集分类方法包括如下步骤：In order to achieve the above object, the present invention provides a data set classification method, and the data set classification method includes the following steps:

将待分类数据集预划分为训练集、测试集和验证集，并基于所述训练集、所述测试集和所述验证集建立动态图像；Pre-dividing the data set to be classified into a training set, a test set and a verification set, and creating a dynamic image based on the training set, the test set and the verification set;

根据所述动态图像接收到的调整指令对所述训练集对应的训练比例值进行调整，以获取目标训练比例值；Adjust the training scale value corresponding to the training set according to the adjustment instruction received from the dynamic image to obtain the target training scale value;

基于所述目标训练比例值和预设的约束条件分别对所述测试集对应的测试比例值和所述验证集对应的验证比例值进行更新，并基于所述目标训练比例值、更新后的所述测试比例值和更新后的所述验证比例值对所述数据集进行分类。The test scale value corresponding to the test set and the verification scale value corresponding to the verification set are respectively updated based on the target training scale value and preset constraints, and based on the target training scale value, the updated all scale values The data set is classified according to the test scale value and the updated validation scale value.

可选地，所述基于所述训练集、测试集和验证集建立动态图像的步骤，包括：Optionally, the step of establishing a dynamic image based on the training set, the test set and the verification set includes:

获取所述训练集在所述数据集中的训练比例值，所述测试集在所述数据集中的测试比例值和所述验证集在所述数据集中的验证比例值；Obtain the training ratio value of the training set in the data set, the test ratio value of the test set in the data set and the verification ratio value of the verification set in the data set;

将所述训练比例值、所述测试比例值和所述验证比例值传入预设图像中，以获取动态图像。The training scale value, the test scale value and the verification scale value are transferred into a preset image to obtain a dynamic image.

可选地，所述基于所述目标训练比例值和预设的约束条件分别对所述测试集对应的测试比例值和所述验证集对应的验证比例值进行更新的步骤，包括：Optionally, the step of respectively updating the test scale value corresponding to the test set and the verification scale value corresponding to the verification set based on the target training scale value and a preset constraint condition includes:

根据所述目标训练比例值计算所述数据集的剩余比例值，并基于预设的约束条件和所述剩余比例值分别对所述测试集对应的测试比例值和所述验证集对应的验证比例值进行更新。Calculate the remaining ratio value of the data set according to the target training ratio value, and based on preset constraints and the remaining ratio value, respectively, the test ratio value corresponding to the test set and the verification ratio corresponding to the verification set value is updated.

可选地，所述约束条件包括等比例调节，所述基于预设的约束条件和所述剩余比例值分别对所述测试集对应的测试比例值和所述验证集对应的验证比例值进行更新的步骤，包括：Optionally, the constraint condition includes equal-scale adjustment, and the test scale value corresponding to the test set and the verification scale value corresponding to the verification set are respectively updated based on the preset constraint condition and the remaining scale value. steps, including:

获取所述测试集对应的测试比例值和所述验证集对应的验证比例值之间的第一和值，并判断所述第一和值是否大于所述剩余比例值；Obtain the first sum value between the test scale value corresponding to the test set and the verification scale value corresponding to the verification set, and determine whether the first sum value is greater than the remaining scale value;

若小于，则基于预设的等比例调节同时对所述测试比例值和所述验证比例值进行等比例增加更新处理，直至所述第一和值等于所述剩余比例值。If it is less than, the test proportional value and the verification proportional value are simultaneously increased and updated in equal proportions based on a preset equal proportional adjustment, until the first sum value is equal to the remaining proportional value.

可选地，所述判断所述第一和值是否大于所述剩余比例值的步骤之后，包括：Optionally, after the step of judging whether the first sum value is greater than the remaining ratio value, the step includes:

若大于，则基于预设的等比例调节同时对所述测试比例值和所述验证比例值进行等比例减小更新处理，直至所述第一和值等于所述剩余比例值。If it is greater than the value, the test proportional value and the verification proportional value are simultaneously subjected to proportional reduction and update processing based on a preset equal proportional adjustment, until the first sum value is equal to the remaining proportional value.

可选地，所述基于所述训练集、所述测试集和所述验证集建立动态图像的步骤之后，包括：Optionally, after the step of establishing a dynamic image based on the training set, the test set and the verification set, it includes:

获取所述动态图像中训练集对应的训练比例值，测试集对应的测试比例值和验证集对应的验证比例值；Obtain the training scale value corresponding to the training set in the dynamic image, the test scale value corresponding to the test set and the verification scale value corresponding to the verification set;

接收测试比例值更新指令，基于所述测试比例值更新指令对所述测试比例值进行更新，以获取更新测试比例值；receiving a test scale value update instruction, and updating the test scale value based on the test scale value update instruction to obtain an updated test scale value;

根据所述训练比例值和所述更新测试比例值对所述验证比例值进行更新，以获取新验证比例值，并基于所述训练比例值、所述更新测试比例值和所述新验证比例值对所述数据集进行分类。The verification scale value is updated according to the training scale value and the update test scale value to obtain a new verification scale value, and based on the training scale value, the update test scale value and the new verification scale value Classify the dataset.

接收验证比例值更新指令，基于所述验证比例值更新指令对所述验证比例值进行更新，以获取新的验证比例值；receiving a verification scale value update instruction, and updating the verification scale value based on the verification scale value update instruction to obtain a new verification scale value;

基于所述训练比例值和所述新的验证比例值对所述测试比例值进行更新，以获取新测试比例值，并基于所述训练比例值、所述新的验证比例值和所述新测试比例值对所述数据集进行分类。The test scale value is updated based on the training scale value and the new validation scale value to obtain a new test scale value, and based on the training scale value, the new validation scale value and the new test The scale value classifies the dataset.

此外，为实现上述目的，本发明还提供一种数据集分类装置，所述数据集分类装置包括：In addition, in order to achieve the above object, the present invention also provides a data set classification device, the data set classification device includes:

建立模块，用于将待分类数据集预划分为训练集、测试集和验证集，并基于所述训练集、所述测试集和所述验证集建立动态图像；establishing a module for pre-dividing the data set to be classified into a training set, a test set and a verification set, and establishing a dynamic image based on the training set, the test set and the verification set;

获取模块，用于根据所述动态图像接收到的调整指令对所述训练集对应的训练比例值进行调整，以获取目标训练比例值；an acquisition module, configured to adjust the training scale value corresponding to the training set according to the adjustment instruction received by the dynamic image, so as to obtain the target training scale value;

分类模块，用于基于所述目标训练比例值对所述测试集对应的测试比例值和所述验证集对应的验证比例值进行更新，并基于所述目标训练比例值、更新后的所述测试比例值和更新后的所述验证比例值对所述数据集进行分类。A classification module, configured to update the test scale value corresponding to the test set and the verification scale value corresponding to the verification set based on the target training scale value, and based on the target training scale value, the updated test scale value The scale value and the updated validation scale value classify the dataset.

此外，为实现上述目的，本发明还提供一种数据集分类设备，所述数据集分类设备包括：存储器、处理器及存储在所述存储器上并可在所述处理器上运行的数据集分类程序，所述数据集分类程序被所述处理器执行时实现如上所述的数据集分类方法的步骤。In addition, in order to achieve the above object, the present invention also provides a data set classification device, the data set classification device includes: a memory, a processor, and a data set classification device stored on the memory and running on the processor A program that, when executed by the processor, implements the steps of the data set classification method described above.

此外，为实现上述目的，本发明还提供一种计算机存储介质，所述计算机存储介质上存储有数据集分类程序，所述数据集分类程序被处理器执行时实现如上所述的数据集分类方法的步骤。In addition, in order to achieve the above object, the present invention also provides a computer storage medium on which a data set classification program is stored, and when the data set classification program is executed by a processor, the above-mentioned data set classification method is implemented A step of.

本发明通过将将待分类数据集预划分为训练集、测试集和验证集，并基于所述训练集、所述测试集和所述验证集建立动态图像；根据所述动态图像接收到的调整指令对所述训练集对应的训练比例值进行调整，以获取目标训练比例值；基于所述目标训练比例值和预设的约束条件分别对所述测试集对应的测试比例值和所述验证集对应的验证比例值进行更新，并基于所述目标训练比例值、更新后的所述测试比例值和更新后的所述验证比例值对所述数据集进行分类。通过将数据集预划分为训练集、测试集和验证集，并建立动态图像，在动态图像中对训练集对应的训练比例值进行调整，得到目标训练比例值，再根据预设的约束条件对测试集对应的测试比例值和验证集对应的验证比例值进行更新，再对数据集进行划分，从而使用户可以根据动态图像来合理地对数据集进行划分，降低了用户的输入成本，提高了用户对数据集分类原理的认识，也提高了对数据集分类的效率和智能性，使得机器学习的效率提高，并且也提高了对数据集分类的处理效率。In the present invention, the data set to be classified is pre-divided into training set, test set and verification set, and a dynamic image is established based on the training set, the test set and the verification set; the adjustment received according to the dynamic image The instruction adjusts the training scale value corresponding to the training set to obtain the target training scale value; based on the target training scale value and the preset constraint conditions, the test scale value corresponding to the test set and the verification set are respectively adjusted. The corresponding validation scale value is updated, and the dataset is classified based on the target training scale value, the updated test scale value, and the updated validation scale value. By pre-dividing the data set into training set, test set and verification set, and establishing a dynamic image, the training scale value corresponding to the training set is adjusted in the dynamic image to obtain the target training scale value, and then according to the preset constraints. The test scale value corresponding to the test set and the verification scale value corresponding to the verification set are updated, and then the data set is divided, so that the user can reasonably divide the data set according to the dynamic image, which reduces the user's input cost and improves the performance. The user's understanding of the principles of data set classification also improves the efficiency and intelligence of data set classification, improves the efficiency of machine learning, and also improves the processing efficiency of data set classification.

附图说明Description of drawings

图1是本发明实施例方案涉及的硬件运行环境的设备结构示意图；1 is a schematic diagram of a device structure of a hardware operating environment involved in an embodiment of the present invention;

图2为本发明数据集分类方法第一实施例的流程示意图；2 is a schematic flowchart of a first embodiment of a data set classification method according to the present invention;

图3为本发明数据集分类装置的装置模块示意图；3 is a schematic diagram of a device module of a data set classification device of the present invention;

图4为本发明数据集分类方法中图像场景示意图；4 is a schematic diagram of an image scene in the data set classification method of the present invention;

图5为本发明数据集分类方法中训练集、测试集和验证集的调节示意图。FIG. 5 is a schematic diagram of the adjustment of the training set, the test set and the verification set in the data set classification method of the present invention.

本发明目的的实现、功能特点及优点将结合实施例，参照附图做进一步说明。The realization, functional characteristics and advantages of the present invention will be further described with reference to the accompanying drawings in conjunction with the embodiments.

具体实施方式Detailed ways

应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

如图1所示，图1是本发明实施例方案涉及的硬件运行环境的设备结构示意图。As shown in FIG. 1 , FIG. 1 is a schematic diagram of a device structure of a hardware operating environment involved in an embodiment of the present invention.

本发明实施例数据集分类设备可以是PC机或服务器设备，其上运行有Java虚拟机。The data set classification device in the embodiment of the present invention may be a PC or a server device, on which a Java virtual machine runs.

如图1所示，该数据集分类设备可以包括：处理器1001，例如CPU，网络接口1004，用户接口1003，存储器1005，通信总线1002。其中，通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard)，可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器，也可以是稳定的存储器(non-volatile memory)，例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。As shown in FIG. 1 , the data set classification device may include: a processor 1001 , such as a CPU, a network interface 1004 , a user interface 1003 , a memory 1005 , and a communication bus 1002 . Among them, the communication bus 1002 is used to realize the connection and communication between these components. The user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. Optionally, the network interface 1004 may include a standard wired interface and a wireless interface (eg, a WI-FI interface). The memory 1005 may be high-speed RAM memory, or may be non-volatile memory, such as disk memory. Optionally, the memory 1005 may also be a storage device independent of the aforementioned processor 1001 .

本领域技术人员可以理解，图1中示出的设备结构并不构成对设备的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置。Those skilled in the art can understand that the device structure shown in FIG. 1 does not constitute a limitation on the device, and may include more or less components than the one shown, or combine some components, or arrange different components.

如图1所示，作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及数据集分类程序。As shown in FIG. 1 , the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a data set classification program.

在图1所示的设备中，网络接口1004主要用于连接后台服务器，与后台服务器进行数据通信；用户接口1003主要用于连接客户端(用户端)，与客户端进行数据通信；而处理器1001可以用于调用存储器1005中存储的数据集分类程序，并执行下述数据集分类方法中的操作。In the device shown in FIG. 1 , the network interface 1004 is mainly used to connect to the background server and perform data communication with the background server; the user interface 1003 is mainly used to connect to the client (client) and perform data communication with the client; and the processor 1001 can be used to invoke the dataset classification program stored in memory 1005 and perform operations in the dataset classification method described below.

基于上述硬件结构，提出本发明数据集分类方法实施例。Based on the above hardware structure, an embodiment of the data set classification method of the present invention is proposed.

参照图2，图2为本发明数据集分类方法第一实施例的流程示意图，所述方法包括：Referring to FIG. 2, FIG. 2 is a schematic flowchart of a first embodiment of a data set classification method according to the present invention. The method includes:

步骤S10，将待分类数据集预划分为训练集、测试集和验证集，并基于所述训练集、所述测试集和所述验证集建立动态图像；Step S10, pre-divide the data set to be classified into a training set, a test set and a verification set, and create a dynamic image based on the training set, the test set and the verification set;

机器学习是关于如何让机器可以更好的处理某些特定任务的理论，它可以从数据中学习，而不是像传统编程那样将规则进行清晰的编码。在机器学习过程中，为了使训练好的机器学习算法模型在实际的环境中取得很好的效果，通常将进行机器学习训练的数据集分类为训练集、测试集和验证集。其中，训练集是用于训练的样本集合，主要用于训练神经网络中的参数。验证集是用于验证模型性能的样本集合，不同神经网络在训练集上训练结束后，通过验证集来比较判断各个模型的性能。测试用于客观的评价神经网络的性能。数据集是一种由数据所组成的集合。Machine learning is the theory of how to make machines better at certain tasks. It can learn from data, rather than clearly coding the rules like traditional programming. In the process of machine learning, in order to make the trained machine learning algorithm model achieve good results in the actual environment, the data set for machine learning training is usually classified into training set, test set and validation set. Among them, the training set is a set of samples used for training, which is mainly used to train the parameters in the neural network. The validation set is a set of samples used to verify the performance of the model. After different neural networks are trained on the training set, the performance of each model is compared and judged through the validation set. Tests are used to objectively evaluate the performance of neural networks. A dataset is a collection of data.

在本实施例中，获取待分类数据集(如即将进行机器学习的数据集)，并将数据集预划分为训练集、测试集和验证集，也就是把数据集划分为三个部分，训练集、测试集和验证集，这三个部分在本质上基本相同，都是一种数据集合。而在本实施例中是可以将数据集随机划分为训练集、测试集和验证集，再获取训练集在数据集中所占据的比例，测试集在数据集中所占据的比例和验证集在数据集中所占据的比例，并根据这些比例来建立动态图像，以便用户可以直观地看到对数据集的分类。而动态图像可以是条形图，树状图等，在此不做限制。例如，如图4所示，整体线代表的比例为100％，两个滑块将整条线划分为三段，每段线的长度对应此数据集的占比，即训练集在数据集中占据的比例为80％，测试集在数据集中占据的比例为10％，验证集在数据集中占据的比例为10％。在线条下展示当前各个数据集的占比，如训练集对应的80，测试集对应的10，验证集对应的10，使用户在调节过程中可以情绪看到当前数据比例，以方便用户对数据集的划分。并且可以对图4中不同数据集线段设置不同的颜色，以降低用户的认知成本。In this embodiment, a data set to be classified (such as a data set to be subjected to machine learning) is obtained, and the data set is pre-divided into a training set, a test set and a verification set, that is, the data set is divided into three parts, and the training set is divided into three parts. Set, test set and validation set, these three parts are basically the same in essence, and they are all a kind of data set. In this embodiment, the data set can be randomly divided into training set, test set and verification set, and then the proportion of the training set in the data set, the proportion of the test set in the data set and the proportion of the verification set in the data set can be obtained. The proportions occupied, and dynamic images are built according to these proportions, so that users can visually see the classification of the dataset. The dynamic image can be a bar graph, a tree graph, etc., which is not limited here. For example, as shown in Figure 4, the proportion represented by the overall line is 100%, and the two sliders divide the entire line into three segments, and the length of each segment corresponds to the proportion of this dataset, that is, the training set occupies the proportion of the dataset. The proportion of the test set in the dataset is 80%, the proportion of the test set in the dataset is 10%, and the proportion of the validation set in the dataset is 10%. The proportion of each current data set is displayed under the line, such as 80 corresponding to the training set, 10 corresponding to the test set, and 10 corresponding to the validation set, so that the user can emotionally see the current data ratio during the adjustment process, so as to facilitate the user to analyze the data. division of the set. And different colors can be set for the line segments of different data sets in Figure 4 to reduce the cognitive cost of users.

步骤S20，根据所述动态图像接收到的调整指令对所述训练集对应的训练比例值进行调整，以获取目标训练比例值；Step S20, adjusting the training scale value corresponding to the training set according to the adjustment instruction received by the dynamic image, so as to obtain the target training scale value;

当建立好具有训练集对应的训练比例值、测试集对应的测试比例值和验证集对应的验证比例值的动态图像后，若检测到用户需要对训练比例值进行调整时，也就是接收到用户输入的调整指令时，可以直接在动态图像中对训练比例值对应的图案进行调整，以便用户可以直观地看到其调整效果，并在对训练比例值调整完成后，获取调整后的目标训练比例值，将目标训练比例值显示在动态图像中。其中，训练比例值是训练集在数据集中所占据的比例。测试比例值是测试集在数据集中所占据的比例。验证比例值是验证集在数据集中所占据的比例。触发调整指令的条件可以是用户在动态图像预设区域输入的数字，也可以是对动态图像中的某一预设按钮执行触摸操作等。After establishing a dynamic image with the training scale value corresponding to the training set, the test scale value corresponding to the test set, and the verification scale value corresponding to the verification set, if it is detected that the user needs to adjust the training scale value, that is, the user receives the When the adjustment command is input, the pattern corresponding to the training scale value can be adjusted directly in the dynamic image, so that the user can intuitively see the adjustment effect, and after the adjustment of the training scale value is completed, the adjusted target training scale can be obtained. value to display the target training scale value in the dynamic image. Among them, the training proportion value is the proportion occupied by the training set in the data set. The test proportion value is the proportion of the test set in the dataset. The validation scale value is the fraction of the validation set in the dataset. The condition for triggering the adjustment instruction may be a number input by the user in the preset area of the dynamic image, or may be a touch operation performed on a preset button in the dynamic image, or the like.

步骤S30，基于所述目标训练比例值和预设的约束条件分别对所述测试集对应的测试比例值和所述验证集对应的验证比例值进行更新，并基于所述目标训练比例值、更新后的所述测试比例值和更新后的所述验证比例值对所述数据集进行分类。Step S30, respectively update the test scale value corresponding to the test set and the verification scale value corresponding to the verification set based on the target training scale value and preset constraints, and update the target training scale value, update The subsequent test scale value and the updated validation scale value classify the dataset.

预设的约束条件可以是提前设置的约束条件，比如测试比例值和验证比例值进行等比例增加或减少；或者是测试比例值每增加预设数量的比例值，其验证比例值才增加一个比例值；或者是验证比例值每增加预设数量的比例值，其测试比例值才增加一个比例值等。并且需要说明的是，在本实施例中，无论对训练集、测试集和验证集做任意的改变，其训练集、测试集和验证集的和值要等于数据集。当获取到目标训练比例值后，还需要获取提前设置的约束条件，再根据该约束条件和目标训练比例值来分别对测试集对应的测试比例值和验证集对应的验证比例值进行更新，并更新完成后根据目标训练比例值、更新后的测试比例值和更新后的验证比例值对数据集进行分类，将数据集划分为目标训练集、目标测试集和目标验证集。The preset constraints can be constraints set in advance, for example, the test scale value and the verification scale value are increased or decreased in equal proportions; or each time the test scale value increases by a preset number of scale values, the verification scale value increases by one percentage. or when the verification scale value increases by a preset number of scale values, the test scale value increases by one scale value, etc. It should be noted that, in this embodiment, no matter any changes are made to the training set, the test set and the validation set, the sum of the training set, the test set and the validation set must be equal to the data set. After obtaining the target training ratio value, it is also necessary to obtain the constraint condition set in advance, and then update the test ratio value corresponding to the test set and the verification ratio value corresponding to the verification set according to the constraint condition and the target training ratio value. After the update is completed, the data set is classified according to the target training ratio value, the updated test ratio value and the updated verification ratio value, and the data set is divided into the target training set, the target test set and the target verification set.

并且，需要说明的是，当用户需要对测试比例值和/或验证比例值进行调整时，可以直接对测试比例值和/或验证比例值进行调整，但此时训练比例值是保持不变的。例如，如图5所示，在实际的建模过程中，训练集是必要的，测试集较重要，验证集在有测试集的情况下可作为辅助，其重要的排列为训练集大于测试集，测试集大于等于验证集，因此可以对图5中动态图像的两个滑块设定不同的交互。滑块1用来调整训练集占数据集的比例，调节滑块1时，测试集合验证集之间维持原来的划分比例。滑块2用于调整测试集和验证集之间的比例，调整滑块2不会影响训练集的占比，也就是调节滑块2时，训练集占据数据集的比例保持不变，只会改变测试集占据数据集的比例和验证集占据数据集的比例。如当训练集、测试集和验证集之间的比例为80:10:10时，此时调节滑块1时，滑块2也会随之滑动以保持测试集与验证集间比例一定，即训练集、测试集和验证集之间的比例改变为50:25:25。但是当调整滑块2时，滑块1的位置保持不变，只会调整测试集和验证集比例，即将训练集、测试集和验证集之间的比例由80:10:10调整为80:18:2。Moreover, it should be noted that when the user needs to adjust the test scale value and/or the verification scale value, the test scale value and/or the verification scale value can be adjusted directly, but the training scale value remains unchanged at this time. . For example, as shown in Figure 5, in the actual modeling process, the training set is necessary, the test set is more important, and the validation set can be used as an auxiliary when there is a test set, and its important arrangement is that the training set is larger than the test set , the test set is greater than or equal to the validation set, so different interactions can be set for the two sliders of the dynamic image in Figure 5. Slider 1 is used to adjust the proportion of the training set to the data set. When slider 1 is adjusted, the original division ratio between the test set and the validation set is maintained. The slider 2 is used to adjust the ratio between the test set and the validation set. Adjusting the slider 2 will not affect the proportion of the training set, that is, when the slider 2 is adjusted, the proportion of the training set occupying the data set remains unchanged, only Change the ratio of the test set to the dataset and the ratio of the validation set to the data set. For example, when the ratio between the training set, the test set and the validation set is 80:10:10, when the slider 1 is adjusted at this time, the slider 2 will also slide to keep the ratio between the test set and the validation set constant, that is The ratio between training set, test set and validation set was changed to 50:25:25. But when slider 2 is adjusted, the position of slider 1 remains unchanged, and only the ratio of test set and validation set is adjusted, that is, the ratio between training set, test set and validation set is adjusted from 80:10:10 to 80: 18:2.

并且由于线条下的数据集比例和滑块联动，通过熟人数值完毕后两个滑块亦会根据数据调整位置，根据各数据集的重要程度，3个输入框划分不同的数值输入限制。输入训练集比例时(假定为a)，测试集和验证集数值保持二者之间的比例自动调整，a输入范围为1≤a≤100，输入100表示整个数据均为训练集，此时测试集和验证集数值均为0；输入测试集比例时(假定为b)不影响训练集的比例，验证集的数值自动调整，b输入范围为1≤b≤100-a，输入100-a时表示整个数据分为训练集和验证集，此时验证集数值为0；输入验证集比例时(假定为c)不影响训练集的比例，测试集的数值自动调整，c输入范围为1≤c＜100-a。And because the data set ratio under the line is linked with the slider, the two sliders will also adjust their positions according to the data after passing the acquaintance value. According to the importance of each data set, the three input boxes are divided into different numerical input limits. When inputting the ratio of the training set (assumed to be a), the values of the test set and the validation set are automatically adjusted to maintain the ratio between the two. The input range of a is 1≤a≤100, and the input of 100 means that the entire data is the training set. At this time, the test set The values of the set and validation set are both 0; when the ratio of the test set (assumed to be b) is input, it does not affect the ratio of the training set, and the value of the validation set is automatically adjusted. The input range of b is 1≤b≤100-a, and when 100-a is input Indicates that the entire data is divided into a training set and a validation set, and the value of the validation set is 0 at this time; when the ratio of the validation set is input (assumed to be c), it does not affect the ratio of the training set, and the value of the test set is automatically adjusted, and the input range of c is 1≤c <100-a.

通过将待进行机器学习的数据集预划分为训练集、测试集和验证集，并基于所述训练集、测试集和验证集建立动态图像；根据所述动态图像接收到的调整指令对所述训练集对应的训练比例值进行调整，以获取目标训练比例值；基于所述目标训练比例值和预设的约束条件分别对所述测试集对应的测试比例值和所述验证集对应的验证比例值进行更新，并基于所述目标训练比例值、更新后的所述测试比例值和更新后的所述验证比例值对所述数据集进行分类。通过将数据集预划分为训练集、测试集和验证集，并建立动态图像，在动态图像中对训练集对应的训练比例值进行调整，得到目标训练比例值，再根据预设的约束条件对测试集对应的测试比例值和验证集对应的验证比例值进行更新，再对数据集进行分类，从而使用户可以根据动态图像来合理地对数据集进行划分，降低了用户的输入成本，提高了用户对数据集分类原理的认识，也提高了对数据集分类的效率和智能性，使得机器学习的效率提高，并且也提高了对数据集分类的处理效率。By pre-dividing the data set to be performed machine learning into training set, test set and verification set, and establishing dynamic images based on the training set, test set and verification set; Adjusting the training scale value corresponding to the training set to obtain the target training scale value; based on the target training scale value and the preset constraint conditions, the test scale value corresponding to the test set and the verification scale corresponding to the verification set are respectively value is updated, and the dataset is classified based on the target training scale value, the updated test scale value, and the updated validation scale value. By pre-dividing the data set into training set, test set and verification set, and establishing a dynamic image, the training scale value corresponding to the training set is adjusted in the dynamic image to obtain the target training scale value, and then according to the preset constraints. The test scale value corresponding to the test set and the verification scale value corresponding to the verification set are updated, and then the data set is classified, so that the user can reasonably divide the data set according to the dynamic image, which reduces the user's input cost and improves the performance. The user's understanding of the principles of data set classification also improves the efficiency and intelligence of data set classification, improves the efficiency of machine learning, and also improves the processing efficiency of data set classification.

进一步地，基于本发明数据集分类方法第一实施例，提出本发明数据集分类方法第二实施例。本实施例是本发明第一实施例的步骤S10，基于所述训练集、测试集和验证集建立动态图像的步骤的细化，包括：Further, based on the first embodiment of the data set classification method of the present invention, a second embodiment of the data set classification method of the present invention is proposed. This embodiment is step S10 of the first embodiment of the present invention, and the refinement of the step of creating a dynamic image based on the training set, test set and verification set includes:

步骤a，获取所述训练集在所述数据集中的训练比例值，所述测试集在所述数据集中的测试比例值和所述验证集在所述数据集中的验证比例值；Step a, obtaining the training ratio value of the training set in the data set, the test ratio value of the test set in the data set and the verification ratio value of the verification set in the data set;

在将数据集预划分为训练集、测试集合验证集后，还需要计算训练集在数据集中所占据的比例，即训练比例值；测试集在数据集中所占据的比例，即测试比例值；验证集在数据集中所占据的比例，即验证比例值。After pre-dividing the data set into training set and test set validation set, it is also necessary to calculate the proportion occupied by the training set in the data set, namely the training proportion value; the proportion occupied by the test set in the data set, namely the test proportion value; The proportion that the set occupies in the data set, that is, the validation proportion value.

步骤b，将所述训练比例值、所述测试比例值和所述验证比例值传入预设图像中，以获取动态图像。In step b, the training scale value, the test scale value and the verification scale value are transferred into a preset image to obtain a dynamic image.

获取预设图像(即没有任何参数的原始图像，并且此原始图像中需要体现出最少三个不同元素的区别)，然后将原先计算好的训练比例值、测试比例值和验证比例值输入到预设图像中，从而得到动态图像，以便让用户较为直观地看到将数据集划分为训练集、测试集和验证集。Obtain the preset image (that is, the original image without any parameters, and the original image needs to reflect the difference of at least three different elements), and then input the previously calculated training scale value, test scale value and verification scale value into the preset image. Set the image in the image, so as to obtain a dynamic image, so that the user can more intuitively see that the data set is divided into training set, test set and verification set.

在本实施例中，通过将训练比例值，测试比例值和验证比例值传入预设图像中，得到动态图像，从而保障了用户可以直观地看到将数据集划分为训练集、测试集和验证集的原理。In this embodiment, a dynamic image is obtained by passing the training scale value, the test scale value and the verification scale value into the preset image, thereby ensuring that the user can intuitively see that the data set is divided into training set, test set and The rationale for the validation set.

进一步地，基于所述目标训练比例值和预设的约束条件分别对所述测试集对应的测试比例值和所述验证集对应的验证比例值进行更新的步骤，包括：Further, the step of respectively updating the test scale value corresponding to the test set and the verification scale value corresponding to the verification set based on the target training scale value and the preset constraint conditions includes:

步骤c，根据所述目标训练比例值计算所述数据集的剩余比例值，并基于预设的约束条件和所述剩余比例值分别对所述测试集对应的测试比例值和所述验证集对应的验证比例值进行更新。Step c, calculating the remaining ratio value of the data set according to the target training ratio value, and respectively corresponding to the test ratio value of the test set and the verification set based on preset constraints and the remaining ratio value. The validation scale value is updated.

当获取到目标训练比例值后，再通过将数据集自身携带的数据值减去目标训练比例值，就得到数据集的剩余比例值。例如，当数据集自身携带的数据值为100，当经过调整后的目标训练比例值为80时，此时就可以通过将100减去80，得到20，这里的20就是剩余比例值。然后再获取预设的约束条件，并根据此剩余比例值对测试集对应的测试比例值和验证集对应的验证比例值进行更新。并且，更新后的测试比例值和更新后的验证比例值之和要等于剩余比例值。例如，当测试比例值为20，验证比例值为20，剩余比例值也为20时，约束条件为等比例递增或递减时，则更新后的测试比例值就为10，更新后的验证比例值就为10。After the target training ratio value is obtained, the remaining ratio value of the data set is obtained by subtracting the target training ratio value from the data value carried by the data set itself. For example, when the data value carried by the dataset itself is 100, when the adjusted target training ratio value is 80, then 20 can be obtained by subtracting 80 from 100, where 20 is the remaining ratio value. Then, the preset constraint conditions are obtained, and the test scale value corresponding to the test set and the verification scale value corresponding to the verification set are updated according to the remaining scale value. And, the sum of the updated test scale value and the updated verification scale value should be equal to the remaining scale value. For example, when the test scale value is 20, the verification scale value is 20, and the remaining scale value is also 20, and the constraint condition is proportionally increasing or decreasing, then the updated test scale value is 10, and the updated verification scale value is Just 10.

在本实施例中，通过根据数据集的剩余比例值和约束条件对测试比例值和验证比例值进行更新，从而保障了更新后的测试比例值和验证比例值的准确性。In this embodiment, the test scale value and the verification scale value are updated according to the remaining scale value and constraint conditions of the data set, thereby ensuring the accuracy of the updated test scale value and the verification scale value.

具体地，基于预设的约束条件和所述剩余比例值分别对所述测试集对应的测试比例值和所述验证集对应的验证比例值进行更新的步骤，包括：Specifically, the steps of respectively updating the test scale value corresponding to the test set and the verification scale value corresponding to the verification set based on the preset constraint condition and the remaining scale value include:

步骤c1，获取所述测试集对应的测试比例值和所述验证集对应的验证比例值之间的第一和值，并判断所述第一和值是否大于所述剩余比例值；Step c1, obtaining the first sum value between the test scale value corresponding to the test set and the verification scale value corresponding to the verification set, and judging whether the first sum value is greater than the remaining scale value;

在本实施例中，约束条件包括等比例调节。In this embodiment, the constraints include proportional adjustment.

获取测试集对应的测试比例值和验证集对应的验证比例值，并计算测试比例值和验证比例值之间的和值，将此和值作为第一和值，然后再判断第一和值是否大于剩余比例值。再基于不同的判断结果执行不同的操作。Obtain the test proportion value corresponding to the test set and the verification proportion value corresponding to the verification set, and calculate the sum value between the test proportion value and the verification proportion value, use this sum value as the first sum value, and then judge whether the first sum value is greater than the remaining scale value. Then perform different operations based on different judgment results.

步骤c2，若小于，则基于预设的等比例调节同时对所述测试比例值和所述验证比例值进行等比例增加更新处理，直至所述第一和值等于所述剩余比例值。Step c2, if it is less than, perform equal-proportional increase and update processing on the test proportional value and the verification proportional value based on a preset equal-proportional adjustment, until the first sum value is equal to the remaining proportional value.

当经过判断发现第一和值小于剩余比例值时，且约束条件为等比例调节时，同时对测试比例值和验证比例值进行等比例递增处理，直至递增处理后的测试比例值和验证比例值之间的和值等于剩余比例值时，停止对测试比例值和验证比例值进行更新处理。When it is judged that the first sum value is smaller than the remaining proportion value, and the constraint condition is equal proportion adjustment, the test proportion value and the verification proportion value are incrementally processed in equal proportions at the same time, until the incremental test proportion value and the verification proportion value are processed. When the sum value is equal to the remaining scale value, the update processing of the test scale value and the verification scale value is stopped.

在本实施例中，通过在测试比例值和验证比例值的第一和值大于剩余比例值时，同时对测试比例值和验证比例值进行等比例增加处理，从而保障了更新后的测试比例值和验证比例值的准确性。In this embodiment, when the first sum of the test scale value and the verification scale value is greater than the remaining scale value, the test scale value and the verification scale value are increased in equal proportions at the same time, thereby ensuring the updated test scale value and verify the accuracy of the scale values.

具体地，判断所述第一和值是否大于所述剩余比例值的步骤之后，包括：Specifically, after the step of judging whether the first sum value is greater than the remaining ratio value, the step includes:

步骤c3，若大于，则基于预设的等比例调节同时对所述测试比例值和所述验证比例值进行等比例减小更新处理，直至所述第一和值等于所述剩余比例值。Step c3, if it is greater than the value, perform equal proportional reduction and update processing on the test proportional value and the verification proportional value at the same time based on a preset equal proportional adjustment, until the first sum value is equal to the remaining proportional value.

当经过判断发现第一和值大于剩余比例值时，且约束条件为等比例调节时，同时对测试比例值和验证比例值进行等比例递减处理，直至递减处理后的测试比例值和验证比例值之间的和值等于剩余比例值时，停止对测试比例值和验证比例值进行更新处理。When it is determined that the first sum value is greater than the remaining proportional value, and the constraint condition is equal-proportional adjustment, the test proportional value and the verification proportional value are simultaneously decreased in equal proportions until the test proportional value and the verification proportional value after the decrease processing are processed. When the sum value is equal to the remaining scale value, the update processing of the test scale value and the verification scale value is stopped.

在本实施例中，通过在测试比例值和验证比例值的第一和值小于剩余比例值时，同时对测试比例值和验证比例值进行等比例减小处理，从而保障了更新后的测试比例值和验证比例值的准确性。In this embodiment, when the first sum of the test scale value and the verification scale value is smaller than the remaining scale value, the test scale value and the verification scale value are simultaneously reduced in equal proportions, thereby ensuring the updated test scale value and verify the accuracy of the scale value.

进一步地，基于本发明数据集分类方法第一至第二任意一个的实施例的基础上，提出本发明数据集分类方法第三实施例。本实施例是本发明第一实施例的步骤S10，基于所述训练集、所述测试集和所述验证集建立动态图像的步骤之后，包括：Further, based on any one of the first to second embodiments of the data set classification method of the present invention, a third embodiment of the data set classification method of the present invention is proposed. This embodiment is step S10 of the first embodiment of the present invention. After the step of establishing a dynamic image based on the training set, the test set and the verification set, it includes:

步骤e，获取所述动态图像中训练集对应的训练比例值，测试集对应的测试比例值和验证集对应的验证比例值；Step e, obtaining the training scale value corresponding to the training set in the dynamic image, the test scale value corresponding to the test set and the verification scale value corresponding to the verification set;

在建立动态图像后，需要获取动态图像中训练集对应的训练比例值，测试集对应的测试比例值和验证集对应的验证比例值。After the dynamic image is established, it is necessary to obtain the training scale value corresponding to the training set, the test scale value corresponding to the test set, and the verification scale value corresponding to the verification set in the dynamic image.

步骤f，接收测试比例值更新指令，基于所述测试比例值更新指令对所述测试比例值进行更新，以获取更新测试比例值；Step f, receiving a test scale value update instruction, and updating the test scale value based on the test scale value update instruction to obtain an update test scale value;

接收用户输入的更新测试比例值的测试比例值更新指令，并根据此测试比例值更新指令对测试比例值进行更新处理，以获取更新后的更新测试比例值。并且在对测试比例值进行更新时，其训练比例值的数据保持不变。A test scale value update instruction for updating the test scale value input by the user is received, and the test scale value is updated according to the test scale value update instruction to obtain the updated update test scale value. And when the test scale value is updated, the data of its training scale value remains unchanged.

步骤g，根据所述训练比例值和所述更新测试比例值对所述验证比例值进行更新，以获取新验证比例值，并基于所述训练比例值、所述更新测试比例值和所述新验证比例值对所述数据集进行分类。Step g, the verification ratio value is updated according to the training ratio value and the update test ratio value to obtain a new verification ratio value, and based on the training ratio value, the update test ratio value and the new verification ratio value. Validation scale values classify the dataset.

在获取到更新测试比例值后，由于训练比例值的数据保持不变，因此还需要通过将数据集的数据值减去更新测试比例值和训练比例值，以得到数据集的剩余值，并根据此剩余值对验证比例值进行调整，直至验证比例值等于剩余值，则将此时的验证比例值作为新验证比例值。再根据训练比例值、更新测试比例值和新验证比例值对数据集进行分类处理，以得到实际的训练值、测试值和比例值。After the update test scale value is obtained, since the data of the training scale value remains unchanged, it is also necessary to subtract the update test scale value and the training scale value from the data value of the data set to obtain the remaining value of the data set, and according to The residual value adjusts the verification scale value until the verification scale value is equal to the residual value, then the verification scale value at this time is used as the new verification scale value. Then, the dataset is classified according to the training scale value, the update test scale value and the new verification scale value to obtain the actual training value, test value and scale value.

在本实施例中，通过根据测试比例值更新指令对测试比例值进行更新，得到更新测试比例值，再通过更新测试比例值和训练比例值对验证比例值进行更新，再对数据集进行分类，从而使用户可以根据动态图像来合理地对数据集进行划分，降低了用户的输入成本，提高了用户对数据集分类原理的认识。In this embodiment, the test scale value is updated according to the test scale value update instruction to obtain the updated test scale value, and then the verification scale value is updated by updating the test scale value and the training scale value, and then the data set is classified, Therefore, the user can reasonably divide the data set according to the dynamic image, which reduces the input cost of the user and improves the user's understanding of the classification principle of the data set.

进一步地，基于所述训练集、所述测试集和所述验证集建立动态图像的步骤之后，包括：Further, after the step of establishing a dynamic image based on the training set, the test set and the verification set, it includes:

步骤v，获取所述动态图像中训练集对应的训练比例值，测试集对应的测试比例值和验证集对应的验证比例值；Step v, obtaining the training scale value corresponding to the training set in the dynamic image, the test scale value corresponding to the test set and the verification scale value corresponding to the verification set;

步骤h，接收验证比例值更新指令，基于所述验证比例值更新指令对所述验证比例值进行更新，以获取新的验证比例值；Step h, receiving a verification ratio value update instruction, and updating the verification ratio value based on the verification ratio value update command to obtain a new verification ratio value;

接收用户输入的更新测试比例值的验证比例值更新指令，并根据此验证比例值更新指令对验证比例值进行更新处理，以获取更新后的新的验证比例值。并且在对验证比例值进行更新时，其训练比例值的数据保持不变。Receive a verification scale value update instruction for updating the test scale value input by the user, and perform update processing on the verification scale value according to the verification scale value update instruction to obtain an updated new verification scale value. And when the validation scale value is updated, the data of its training scale value remains unchanged.

步骤k，基于所述训练比例值和所述新的验证比例值对所述测试比例值进行更新，以获取新测试比例值，并基于所述训练比例值、所述新的验证比例值和所述新测试比例值对所述数据集进行分类。Step k, update the test scale value based on the training scale value and the new verification scale value to obtain a new test scale value, and based on the training scale value, the new verification scale value and all The dataset is classified using the new test scale value.

在获取到新的验证比例值后，由于训练比例值的数据保持不变，因此还需要通过将数据集的数据值减去新的验证比例值和训练比例值，以得到数据集的剩余值，并根据此剩余值对测试比例值进行调整，直至测试比例值等于剩余值，则将此时的测试比例值作为新测试比例值。再根据训练比例值、新测试比例值和新的验证比例值对数据集进行分类处理，以得到实际的训练值、测试值和比例值。After the new verification scale value is obtained, since the data of the training scale value remains unchanged, it is also necessary to subtract the new verification scale value and the training scale value from the data value of the data set to obtain the remaining value of the data set, And adjust the test scale value according to the residual value until the test scale value is equal to the residual value, then take the test scale value at this time as the new test scale value. The dataset is then classified according to the training scale value, the new test scale value and the new validation scale value to obtain the actual training value, test value and scale value.

在本实施例中，通过根据验证比例值更新指令对验证比例值进行更新，得到新的验证比例值，再通过新的验证比例值和训练比例值对测试比例值进行更新，再对数据集进行分类，从而使用户可以根据动态图像来合理地对数据集进行划分，降低了用户的输入成本，提高了用户对数据集分类原理的认识。In this embodiment, the verification ratio value is updated according to the verification ratio value update instruction to obtain a new verification ratio value, and then the test ratio value is updated through the new verification ratio value and the training ratio value, and then the data set is updated. Classification, so that the user can reasonably divide the data set according to the dynamic image, reduce the input cost of the user, and improve the user's understanding of the classification principle of the data set.

本发明还提供一种数据集分类装置，参照图3，所述数据集分类装置包括：The present invention also provides a data set classification device. Referring to FIG. 3, the data set classification device includes:

可选地，所述建立模块，还用于：Optionally, the establishment module is also used for:

可选地，所述分类模块，还用于：Optionally, the classification module is also used for:

可选地，所述约束条件包括等比例调节，所述分类模块，还用于：Optionally, the constraints include proportional adjustment, and the classification module is further configured to:

可选地，所述数据集分类装置，还包括：Optionally, the data set classification device further includes:

上述各程序模块所执行的方法可参照本发明数据集分类方法各个实施例，此处不再赘述。For the methods executed by the above program modules, reference may be made to the various embodiments of the data set classification method of the present invention, which will not be repeated here.

本发明还提供一种计算机存储介质。The present invention also provides a computer storage medium.

本发明计算机存储介质上存储有数据集分类程序，所述数据集分类程序被处理器执行时实现如上所述的数据集分类方法的步骤。A data set classification program is stored on the computer storage medium of the present invention, and when the data set classification program is executed by the processor, the steps of the above-mentioned data set classification method are implemented.

其中，在所述处理器上运行的数据集分类程序被执行时所实现的方法可参照本发明数据集分类方法各个实施例，此处不再赘述。For the method implemented when the data set classification program running on the processor is executed, reference may be made to the various embodiments of the data set classification method of the present invention, which will not be repeated here.

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or system comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or system. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article or system that includes the element.

上述本发明实施例序号仅仅为了描述，不代表实施例的优劣。The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages or disadvantages of the embodiments.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端设备(可以是手机，计算机，服务器，空调器，或者网络设备等)执行本发明各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on such understanding, the technical solutions of the present invention can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products are stored in a storage medium (such as ROM/RAM) as described above. , magnetic disk, optical disk), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present invention.

以上仅为本发明的优选实施例，并非因此限制本发明的专利范围，凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换，或直接或间接运用在其他相关的技术领域，均同理包括在本发明的专利保护范围内。The above are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present invention, or directly or indirectly applied in other related technical fields , are similarly included in the scope of patent protection of the present invention.

Claims

1. a data set classification method, is characterized in that, described data set classification method comprises the steps:

Pre-dividing the data set to be classified into a training set, a test set and a verification set, and creating a dynamic image based on the training set, the test set and the verification set;

Adjust the training scale value corresponding to the training set according to the adjustment instruction received from the dynamic image to obtain the target training scale value;

The test scale value corresponding to the test set and the verification scale value corresponding to the verification set are respectively updated based on the target training scale value and preset constraints, and based on the target training scale value, the updated all scale values The data set is classified according to the test scale value and the updated validation scale value.

2. The data set classification method as claimed in claim 1, wherein the step of establishing a dynamic image based on the training set, the test set and the verification set comprises:

Obtain the training ratio value of the training set in the data set, the test ratio value of the test set in the data set and the verification ratio value of the verification set in the data set;

The training scale value, the test scale value and the verification scale value are transferred into a preset image to obtain a dynamic image.

3. The data set classification method according to claim 1, wherein the test proportion value corresponding to the test set and the verification set corresponding to the test proportion value based on the target training proportion value and the preset constraint condition are respectively The steps for updating the validation scale value include:

Calculate the remaining ratio value of the data set according to the target training ratio value, and based on preset constraints and the remaining ratio value, respectively, the test ratio value corresponding to the test set and the verification ratio corresponding to the verification set value is updated.

4. data set classification method as claimed in claim 3, is characterized in that, described constraint condition comprises equal proportion adjustment,

The step of respectively updating the test scale value corresponding to the test set and the verification scale value corresponding to the verification set based on the preset constraint condition and the remaining scale value includes:

Obtain the first sum value between the test scale value corresponding to the test set and the verification scale value corresponding to the verification set, and determine whether the first sum value is greater than the remaining scale value;

If it is less than, the test proportional value and the verification proportional value are simultaneously increased and updated in equal proportions based on a preset equal proportional adjustment, until the first sum value is equal to the remaining proportional value.

5. The data set classification method according to claim 4, wherein after the step of judging whether the first sum value is greater than the remaining ratio value, the method comprises:

If it is greater than the value, the test proportional value and the verification proportional value are simultaneously subjected to proportional reduction and update processing based on a preset equal proportional adjustment, until the first sum value is equal to the remaining proportional value.

6. The data set classification method according to any one of claims 1-5, wherein after the step of establishing a dynamic image based on the training set, the test set and the verification set, the method further comprises:

Obtain the training scale value corresponding to the training set in the dynamic image, the test scale value corresponding to the test set and the verification scale value corresponding to the verification set;

receiving a test scale value update instruction, and updating the test scale value based on the test scale value update instruction to obtain an updated test scale value;

The verification scale value is updated according to the training scale value and the update test scale value to obtain a new verification scale value, and based on the training scale value, the update test scale value and the new verification scale value Classify the dataset.

7. The data set classification method according to any one of claims 1-5, wherein after the step of establishing a dynamic image based on the training set, the test set and the verification set, the method further comprises:

receiving a verification scale value update instruction, and updating the verification scale value based on the verification scale value update instruction to obtain a new verification scale value;

The test scale value is updated based on the training scale value and the new validation scale value to obtain a new test scale value, and based on the training scale value, the new validation scale value and the new test The scale value classifies the dataset.

8. A data set classification device, wherein the data set classification device comprises:

establishing a module for pre-dividing the data set to be classified into a training set, a test set and a verification set, and establishing a dynamic image based on the training set, the test set and the verification set;

an acquisition module, configured to adjust the training scale value corresponding to the training set according to the adjustment instruction received by the dynamic image, so as to obtain the target training scale value;

A classification module, configured to update the test scale value corresponding to the test set and the verification scale value corresponding to the verification set based on the target training scale value, and based on the target training scale value, the updated test scale value The scale value and the updated validation scale value classify the dataset.

9. A data set classification device, characterized in that the data set classification device comprises: a memory, a processor, and a data set classification program stored in the memory and executable on the processor, the data A set classification program, when executed by the processor, implements the steps of a data set classification method as claimed in any one of claims 1 to 7.

10. A computer storage medium, wherein a data set classification program is stored on the computer storage medium, and when the data set classification program is executed by a processor, the method according to any one of claims 1 to 7 is implemented. Steps of a dataset classification method.