CN114219025A

CN114219025A - A kind of asset recovery rate classification method, device, equipment and storage medium

Info

Publication number: CN114219025A
Application number: CN202111531139.5A
Authority: CN
Inventors: 傅莉莉; 朱富荣; 林宜领; 庄佳和; 巫小兰
Original assignee: China Construction Bank Corp
Current assignee: China Construction Bank Corp
Priority date: 2021-12-14
Filing date: 2021-12-14
Publication date: 2022-03-22

Abstract

The embodiment of the invention relates to the technical field of information processing, in particular to an asset recovery rate classification method, device, equipment and storage medium. The method comprises the following steps: acquiring attribute information of target assets and an asset recovery rate classification model which is trained in advance; inputting the attribute information into an asset recovery rate classification model, and obtaining a target classification result of the asset recovery rate of the target asset according to an output result of the asset recovery rate classification model; the asset recovery rate classification model comprises a classification model constructed on the basis of a lightweight gradient lifting frame, the target classification result comprises that the asset recovery rate is located or not located in a preset asset recovery rate interval, and the asset recovery rate interval comprises a numerical value interval related to complete asset recovery and/or zero asset recovery. The technical scheme of the embodiment of the invention solves the problem of low classification precision of the asset recovery rate.

Description

Asset recovery rate classification method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of information processing, in particular to an asset recovery rate classification method, device, equipment and storage medium.

Background

The value evaluation is a core link in the treatment of the undesirable assets, is a basic premise for realizing the effective application of various treatment technologies, and is in a very important position in the whole process of the treatment of the undesirable assets. It should be noted that a prerequisite for effective implementation of value valuation is accurate classification of asset recovery rates of bad assets.

However, the existing asset recovery rate classification scheme has the problem of low classification accuracy of the asset recovery rate.

Disclosure of Invention

The embodiment of the invention provides an asset recovery rate classification method, device, equipment and storage medium, and solves the problem of low classification precision of asset recovery rate.

In a first aspect, an embodiment of the present invention provides an asset recovery rate classification method, which may include:

acquiring attribute information of target assets and an asset recovery rate classification model which is trained in advance;

inputting the attribute information into an asset recovery rate classification model, and obtaining a target classification result of the asset recovery rate of the target asset according to an output result of the asset recovery rate classification model;

the asset recovery rate classification model comprises a classification model constructed on the basis of a lightweight gradient lifting frame, the target classification result comprises that the asset recovery rate is or is not in a preset asset recovery rate interval, and the asset recovery rate interval comprises a numerical value interval related to complete asset recovery and/or zero asset recovery.

In a second aspect, an embodiment of the present invention further provides an asset recovery rate classification device, which may include:

the asset recovery rate classification model acquisition module is used for acquiring attribute information of target assets and an asset recovery rate classification model which is trained in advance;

the target classification result obtaining module is used for inputting the attribute information into the asset recovery rate classification model and obtaining a target classification result of the asset recovery rate of the target asset according to the output result of the asset recovery rate classification model;

In a third aspect, an embodiment of the present invention further provides an asset recovery rate classification device, which may include:

one or more processors;

a memory for storing one or more programs;

when executed by one or more processors, cause the one or more processors to implement the asset recovery classification methodology provided by any of the embodiments of the present invention.

In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the asset recovery rate classification method provided in any embodiment of the present invention.

In a fifth aspect, embodiments of the present invention further provide a computer program product, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the asset recovery rate classification method provided in any of the embodiments of the present invention.

According to the technical scheme of the embodiment of the invention, by acquiring the attribute information of the target asset and the asset recovery rate classification model which is trained in advance, the asset recovery rate classification model can comprise a classification model which is constructed based on a lightweight gradient lifting frame, and is very suitable for processing the data sample with uneven distribution, namely the asset recovery rate; further, the attribute information is input into the asset recovery rate classification model, and according to the output result of the asset recovery rate classification model, a target classification result of the asset recovery rate of the target asset at one or several future time points is obtained, wherein the target classification result may include that the asset recovery rate is or is not within a preset asset recovery rate interval, and the asset recovery rate interval includes a numerical value interval related to the complete recovery and/or the zero recovery of the asset. According to the technical scheme, the asset recovery rate classification model constructed based on the lightweight gradient lifting framework is applied to the application scene of asset recovery rate classification, the problem that the classification precision of the asset recovery rate is not high is solved, and the effect of accurately classifying the asset recovery rate is achieved.

Drawings

FIG. 1 is a flow chart of an asset recovery classification method in accordance with a first embodiment of the present invention;

FIG. 2 is a flow chart of an asset recovery classification method according to a second embodiment of the present invention;

FIG. 3 is a schematic illustration of an alternative example of an asset recovery classification method in a second embodiment of the present invention;

FIG. 4 is a flow chart of an asset recovery classification method in a third embodiment of the present invention;

FIG. 5 is a block diagram of an asset recovery rate classifying device according to a fourth embodiment of the present invention;

fig. 6 is a schematic structural diagram of an asset recovery rate classifying device in the fifth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

In addition, it should be emphasized that the technical solutions of the present application, such as obtaining, storing, using, and processing of data, all conform to relevant regulations of national laws and regulations.

Before the embodiment of the present invention is described, an application scenario of the embodiment of the present invention is exemplarily described: as can be seen from big data analysis, asset recovery is not amenable to uniform distribution, but is usually concentrated within one or a few cells, even to bimodal distribution. For example, a loan poor asset, where about 38% of the data samples had an asset recovery rate greater than 99% (i.e., approximately full recovery) and about 10% had an asset recovery rate less than 1% (i.e., approximately zero recovery), i.e., the asset recovery rates at the extremes of zero recovery and full recovery were greater. At present, the classification problem of the asset recovery rate is mainly solved based on a logistic regression algorithm, but the logistic regression algorithm is more suitable for being applied to uniformly distributed data samples and is difficult to ensure the classification precision of the asset recovery rate. In order to solve this problem, the inventors have conducted intensive studies and experiments on various classification algorithms, and thus proposed an asset recovery rate classification method as set forth in each of the following examples.

Example one

Fig. 1 is a flowchart of an asset recovery rate classification method provided in an embodiment of the present invention. The present embodiment is applicable to situations of accurately categorizing asset recovery rates. The method can be executed by the asset recovery rate classification device provided by the embodiment of the invention, the device can be realized by software and/or hardware, and the device can be integrated on asset recovery rate classification equipment which can be various user terminals or servers.

Referring to fig. 1, the method of the embodiment of the present invention specifically includes the following steps:

s110, obtaining attribute information of the target asset and an asset recovery rate classification model which is trained in advance, wherein the asset recovery rate classification model comprises a classification model which is constructed on the basis of a lightweight gradient lifting framework.

The target asset may be an asset to be classified for its asset recovery rate at one or several future time points, such as a public asset, an individual loan asset, a small mini-rapid loan asset, a verified asset, a bad asset, etc., and is not specifically limited herein. The attribute information may be information related to the target asset, such as asset information of the target asset at the receiving time, basic information and credit information of an owner of the target asset, management information in a management process of the target asset, and the like, which are not specifically limited herein. The receiving time point may be a time point when the target asset is received, for example, the receiving time point of the poor asset may be a time point when a certain normal asset is shifted to the poor asset; the management information may be information at the time of daily management of a target property, such as a loan balance.

The asset recovery rate classification model may be a Machine learning model trained to classify asset recovery rates of the target asset at one or several future time points according to the attribute information, and may be a classification model constructed based on a Light Gradient Boosting Machine (LightGBM). It should be noted that the LightGBM-based construction of the asset recovery rate classification model is a result of final determination by the inventors after many studies and experiments, which is very suitable for classifying data samples with many extreme cases, such as asset recovery rate. The LightGBM is a serial ensemble learning algorithm based on a Decision Tree, which is a framework obtained by performing deep optimization on the basis of a Gradient Boosting Decision Tree (GBDT), and the reason that the LightGBM is suitable for the classification of asset recovery rate is as follows:

(1) the model calculation cost is low: in view of the fact that the amount of data in a data source is hundreds of thousands of meters and hundreds of millions of meters (such as a flow meter and the like), and the computing power of a real-time computing environment is often weak, LightGBM is very suitable for modeling such data;

(2) importance of model built-in features: the LightGBM is integrated with feature importance (feature importance), so that modeling personnel can calculate, check and summarize the importance of each dimension (namely field) very quickly, and a foundation is laid for the discussion work of field importance in a model demonstration link;

(3) having a standardized interface: the LightGBM is provided with an algorithm interface for butting Sciket-Learn, so that the model is developed and called very conveniently, codes do not need to be executed for the LightGBM development algorithm independently, only the codes which are the same as the codes (such as random forests and the like) provided for the Sciket-Learn module need to be provided for the LightGBM development algorithm, and the calling name of the algorithm is changed, so that the high efficiency of algorithm development, argumentation and modification is ensured, the development period is short, and the modification work (such as parameter modification and the like) of the algorithm after each argumentation can be completed more efficiently.

And S120, inputting the attribute information into the asset recovery rate classification model, and obtaining a target classification result of the asset recovery rate of the target asset according to an output result of the asset recovery rate classification model, wherein the target classification result comprises that the asset recovery rate is or is not in a preset asset recovery rate interval, and the asset recovery rate interval comprises a numerical value interval related to complete asset recovery and/or zero asset recovery.

The asset recovery rate interval may be a preset numerical interval related to the complete asset recovery and/or the zero asset recovery, such as (99%, 100% ], [0, 1%), and the like, which is not specifically limited herein. The target classification result may be that the asset recovery rate is within an asset recovery interval or not within an asset recovery interval. And inputting the attribute information into the asset recovery rate classification model, so that a target classification result of the asset recovery rate of the target asset at one or more future time points can be obtained according to the output result of the asset recovery rate classification model, and the target classification result can represent whether the asset recovery rate belongs to an extreme value or not.

In practical application, optionally, when the asset recovery rate is in the asset recovery rate interval, the output result can be the asset recovery rate interval, so that the numerical value interval to which the asset recovery rate belongs can be visually determined according to the output result, and the convenience and the intuitiveness of application are better. Optionally, target classification results of asset recovery rates of various target assets may be obtained through the above steps, and then the obtained target classification results are stored. Therefore, when the target classification result of the asset recovery rate of a certain target asset needs to be called in practical application, the target classification result can be directly searched from the storage result, and the acquisition speed of the target classification result is ensured.

An optional technical solution for obtaining a pre-trained asset recovery rate classification model may include: acquiring an asset recovery rate classification model which is trained in advance and corresponds to each future time point in a future time period; correspondingly, inputting the attribute information into the asset recovery rate classification model, and obtaining a target classification result of the asset recovery rate of the target asset according to an output result of the asset recovery rate classification model, which may include: for each asset recovery rate classification model, inputting the attribute information into the asset recovery rate classification model to obtain an output result of the asset recovery rate classification model; and obtaining a target classification result of the asset recovery rate of the target asset at a future time point corresponding to the asset recovery rate classification model according to the output result. Wherein the future point in time may be a point in time of a future target classification result on which asset recovery is to be predicted. Because the characteristics of the asset recovery rate at each future time point may be different, in order to ensure the accuracy of the target classification result, the asset recovery rate classification models respectively corresponding to each future time point can be obtained, then the attribute information is respectively input into the asset recovery rate classification models, and the target classification result of the asset recovery rate at the corresponding target time point is obtained according to the output result of each asset recovery rate classification model, thereby further ensuring the accuracy of the target classification result. In practice, the future time period may alternatively be 2 years into the future, and the future time point may be each month within 2 years into the future.

On the basis, optionally, for each future time point, the asset recovery rate classification model corresponding to the future time point is obtained by pre-training through the following steps: acquiring historical information of historical assets and historical classification results of asset recovery rates of the historical assets at historical time points, wherein the positions of the historical time points in the historical time periods are the same as the positions of the future time points in the future time periods; taking the historical information and the historical classification result as a group of historical samples; training a historical classification model to be trained based on a plurality of groups of historical samples to obtain an asset recovery rate classification model corresponding to a future time point, wherein the historical classification model comprises a classification model constructed based on a lightweight gradient lifting frame. Specifically, the historical assets and the target assets have the same essential contents, and are named differently only for distinguishing the model training phase from the model application phase, and the essential contents are not particularly limited. The history information and the attribute information are similar, and are not described in detail herein. The position of the historical time point in the historical time period is the same as the position of the future time point in the future time period, if the future time point is the Nth time point in the future time period, the historical time point is the Nth time point in the historical time period, and N is a positive integer, so that the asset recovery rate classification model obtained by training is the classification model corresponding to the future time point, and the one-to-one correspondence of model training is ensured.

Example two

Fig. 2 is a flowchart of an asset recovery rate classification method provided in the second embodiment of the present invention. The present embodiment is optimized based on the above technical solutions. In this embodiment, optionally, the asset recovery rate classification model is obtained by pre-training through the following steps: acquiring sample information of sample assets and sample classification results of asset recovery rates of the sample assets, and taking the sample information and the sample classification results as a group of training samples; training a sample classification model to be trained based on a plurality of groups of training samples to obtain an asset recovery rate classification model, wherein the sample classification model comprises a classification model constructed based on a lightweight gradient lifting frame. The same or corresponding terms as those in the above embodiments are not explained in detail herein.

Referring to fig. 2, the method of the present embodiment may specifically include the following steps:

s210, obtaining sample information of the sample assets and sample classification results of asset recovery rates of the sample assets, and taking the sample information and the sample classification results as a group of training samples, wherein the sample classification results comprise that the asset recovery rates are or are not in a preset asset recovery rate interval, and the asset recovery rate interval comprises a numerical value interval related to complete asset recovery and/or zero asset recovery.

The sample asset and the target asset have the same substance, and are named differently only for distinguishing the model training phase from the model application phase, and are not specifically limited to the substance thereof. The sample information and the attribute information, and the sample classification result and the target classification result are similar, and are not described herein again.

S220, training a sample classification model to be trained based on a plurality of groups of training samples to obtain an asset recovery rate classification model, wherein the sample classification model comprises a classification model constructed based on a lightweight gradient lifting frame.

In practical application, optionally, considering that the LightGBM uses a leaf _ wise _ tree as a production strategy, the leaf _ wise _ tree has the advantages of fast convergence and easy overfitting, so in the model training process, in order to solve the overfitting problem, the following two aspects can be adjusted: 1) dimensionality reduction by Principal Component Analysis (PCA): when the fields have high linear correlation, the data redundancy can be effectively removed by reducing dimensions based on PCA, and the overfitting of the model is reduced. Overfitting is prevented by parametrization: the maximum depth max _ depth of the tree is limited to avoid the depth growth of the tree, the regular term coefficient lambda _ l1 is adjusted, the number of leaf node samples is limited to prevent overfitting, and the parameters are searched and adjusted by a grid with cross validation.

And S230, acquiring attribute information of the target asset and an asset recovery rate classification model.

S240, inputting the attribute information into the asset recovery rate classification model, and obtaining a target classification result of the asset recovery rate of the target asset according to the output result of the asset recovery rate classification model.

According to the technical scheme, the obtained sample information of the sample assets and the sample classification results of the asset recovery rates of the sample assets are used as a group of training samples, and a sample classification model which is trained and constructed on the basis of the lightweight gradient lifting framework is trained on the basis of a plurality of groups of training samples, so that the asset recovery rate classification model suitable for asset recovery rate classification is obtained.

An optional technical solution, after taking the sample information and the sample classification result as a set of training samples, the asset recovery rate classification method may further include: acquiring a plurality of groups of training samples, and determining the missing rate of each field in the plurality of groups of training samples according to the sample information of the field; determining a first field from each field according to the missing rate of each field; and updating the acquired multiple groups of training samples based on the sample information in the first field. The sample information in a set of training samples may be composed of information under each field, how many fields are in a set of training samples, and how many fields are in a plurality of sets of training samples. And for each field in the multiple groups of training samples, determining the missing rate of the field according to whether the sample information of each group of training samples in the multiple groups of training samples under the field is missing. The absence of the sample information means that the absence of the sample information cannot play a forward role in the model training process, so that the first field can be determined from each field according to the absence rate of each field, for example, the field with the lower absence rate is used as the first field, and then the model training is performed based on the sample information under the first field, so that the speed and the accuracy of the model training are effectively ensured by screening the sample information.

In another optional technical solution, after obtaining the asset recovery rate classification model, the asset recovery rate classification method may further include: obtaining the importance of each field according to the output result of the asset recovery rate classification model; determining a second field from the fields according to the importance, and updating the obtained sample information based on the sample information in the second field; the step of using the sample information and the sample classification result as a set of training samples is performed again. In consideration of the characteristic importance of the LightGBM integration, the importance of each field can be obtained according to a model training result, and the importance can indicate whether the sample information under the corresponding field can improve the accuracy of model classification, so that a second field can be determined from each field according to the importance of each field, for example, a field with higher importance is used as the second field, and then model training is performed based on the sample information under the second field, so that the speed and accuracy of model training are effectively ensured by screening the sample information.

In another optional technical solution, after obtaining the asset recovery rate classification model, the asset recovery rate classification method may further include: acquiring evaluation information of the evaluation assets and evaluation classification results of asset recovery rates of the evaluation assets, and taking the evaluation information and the evaluation classification results as a group of evaluation samples; and evaluating the obtained asset recovery rate classification model based on the multiple groups of evaluation samples and preset evaluation indexes to obtain an evaluation result of the asset recovery rate classification model. Wherein, in order to determine the classification precision of the asset recovery rate classification obtained after model training, the classification precision can be evaluated. Specifically, the sample asset and the assessment asset have the same substance, and are named differently only for distinguishing the model training stage from the model assessment stage, and are not limited to the substance thereof. The sample information and the evaluation information, and the sample classification result and the evaluation classification result are similar, and are not described herein again. The evaluation index may be a preset index for evaluating the classification accuracy of the asset recovery rate classification model obtained by training, such as an Intersection over Union (ACU), an accuracy rate, a recall rate, a confusion matrix, and the like, which is not specifically limited herein. Further, it may be determined whether the model needs to be trained continuously according to the evaluation result. In practice, optionally, a ratio is consideredThe accuracy, precision and recall of the better classifier are all 1, but this is almost impossible in practical application, and the precision and recall tend to affect each other, one increasing and the other decreasing. To this end, the embodiment of the present invention proposes to appropriately balance the accuracy and the recall ratio based on the evaluation index set as follows:

wherein, F₁Is an evaluation index, P is precision, R is recall, TP is the number of positive samples predicted as positive, TN is the number of negative samples predicted as negative, FP is the number of negative samples predicted as positive.

The model training process may be considered as part of a model development demonstration process, which may include the following 6 parts: data import, data preprocessing, feature engineering, model algorithms (i.e., model training), result evaluation, and business demonstration. Illustratively, see FIG. 3, where the dashed nodes are model training scripts (e.g., data import scripts, preprocessing scripts, feature engineering scripts, and model algorithm scripts) that provide the most core executable code in the model build; the short line-point nodes are interactive data files (such as original data files, processed data files, model algorithm storage files and prediction result data files), and different code scripts can carry out data interaction; the half-point nodes are model demonstration scripts which carry out omnibearing demonstration on the data and the model of each link through running codes, outputting images and tables.

EXAMPLE III

Fig. 4 is a flowchart of an asset recovery rate classification method provided in the third embodiment of the present invention. The present embodiment is optimized based on the technical solutions in the second embodiment. In this embodiment, optionally, the obtaining of the sample information of the sample asset may include: respectively obtaining sample information of sample assets under each field; the asset recovery rate classification method may further include: for each field, classifying the field according to the sample information under the field to obtain a field classification result, wherein the field classification result comprises a numerical value class, a date class, a type class or a description class; and if the field does not belong to the numerical value category according to the category classification result, converting the sample information under the field into the information under the numerical value category according to the category classification result, and updating the sample information under the field based on the conversion result. The same or corresponding terms as those in the above embodiments are not explained in detail herein.

Referring to fig. 4, the method of this embodiment may specifically include the following steps:

s310, obtaining sample information of the sample assets under each field and sample classification results of asset recovery rates of the sample assets, and taking the sample information and the sample classification results as a group of training samples, wherein the sample classification results comprise that the asset recovery rates are or are not in a preset asset recovery rate interval, and the asset recovery rate interval comprises a numerical value interval related to complete asset recovery and/or zero asset recovery.

Wherein the sample information may be information of the sample asset under a certain field.

S320, classifying the fields according to the sample information under the fields to obtain the classification results of the fields, wherein the classification results comprise numerical value classes, date classes, type classes or description classes.

The numerical value category may be a category in which all numerical values are in the sample information, the date category may be a category in which all dates are shown in the sample information, the type category may be a category in which the sample information includes a plurality of information types, and the description category may be a category in which the sample information includes a plurality of information characters. Because only the sample information under the numerical category can participate in the model training process, and the sample information under the other categories except the numerical category cannot participate in the model training process, for each field, the field can be classified according to the sample information under the field to obtain the category classification result of the field, and the sample information under the field is naturally the information under the category classification result of the field.

In practical applications, optionally, after obtaining the classification result of the field, the asset recovery rate classification method may further include: for each field belonging to a date category, determining a decremented field and a decremented field from each field belonging to the date category; subtracting the sample information under the subtraction field from the sample information under the subtraction field, and taking the subtraction result as the sample information under the newly added field; and classifying the newly added fields according to the sample information under the newly added fields to obtain the classification result of the newly added fields. In practical applications, since many fields belonging to the date category are considered, the fields under the date category can be subjected to feature derivation, thereby obtaining a newly added field capable of improving the accuracy of model classification. From the practical application perspective, which fields are subtracted can be determined according to whether the newly added fields obtained from the subtraction have business significance, and the following two cases can be specifically distinguished:

(1) service significance date difference: and (3) differentiating some fields under the date category to obtain a derivative variable (namely a newly added field), wherein the differentiation at the moment comprises business significance and can be interpreted by business. For example, the age of a client when a loan asset of the client is transferred into a bad asset can be determined according to the difference between the birth date of the client and the receiving time point (year, month and day) of the bad asset;

(2) the total arrangement date difference: in the case where the fields under the date category are limited, the difference can be made for the full number of fields pairwise as a derivative variable. For example, derived variables derived therefrom may include the number of days of a working day after entering poor assets, the number of months from the customer's first payment day to when entering poor assets, and the like.

S330, if the field is determined not to belong to the numerical value category according to the category classification result, converting the sample information under the field into the information under the numerical value category according to the category classification result, and updating the sample information under the field based on the conversion result.

The method comprises the steps of converting sample information under non-numerical categories into information under numerical categories, so that the obtained sample information can participate in the model training process.

S340, training a sample classification model to be trained based on a plurality of groups of training samples to obtain an asset recovery rate classification model, wherein the sample classification model comprises a classification model constructed based on a lightweight gradient lifting frame.

And S350, acquiring attribute information of the target asset and an asset recovery rate classification model.

And S360, inputting the attribute information into the asset recovery rate classification model, and obtaining a target classification result of the asset recovery rate of the target asset according to the output result of the asset recovery rate classification model.

According to the technical scheme of the embodiment of the invention, the sample information of the sample assets under each field is respectively obtained, and for each field, the field is classified according to the sample information under the field to obtain the classification result of the field; furthermore, the sample information in fields other than the numerical type can be converted into information in the numerical type, so that each piece of acquired sample information can participate in the model training process.

An optional technical solution, classifying the fields according to their sample information may include: and acquiring the storage format of the sample information under the field, and determining whether the field belongs to the numerical value category or the date category according to the storage format. Since most of the sample information in the value category is stored based on the storage format of numpy.flow 64 or numpy.int64, whether a field belongs to the value category may be determined based on the storage format. In practical applications, optionally, since fields with non-numerical meanings such as numbers, categories, codes and the like may also be stored based on the two storage formats, after a field is determined to belong to a numerical category, the field may be checked again based on a preset rule to determine whether the field is a numerical value with a real numerical meaning or a number code without a numerical meaning, so as to avoid generating noisy data. The category classification process of the date category is similar, and the sample information under the date category is stored based on the storage format of pandas. In practical applications, optionally, since some sample information under the date category is not stored based on the two storage formats, such as the string "xxxx-xx-xx" format, the date and time mixed format, and the like, the identification and extraction of such fields may be performed according to a regular expression. According to the technical scheme, the accurate division of the numerical value category and/or the date category is effectively guaranteed through the storage format.

An optional technical scheme is that the type category can be a category that the sample information contains multiple information types, for example, in a plurality of groups of training samples, some sample information under a certain field is planting industry, some sample information is financial industry, some sample information is IT industry, some sample information is education industry, and the like, and the field belongs to the type category. Therefore, classifying the fields according to the sample information under the fields may include: the type number of the information types contained in all sample information under the field is acquired, whether the field belongs to the type category or not is determined according to the type number, if the type number exceeds a preset type number threshold value, the field is divided into the type categories, and accurate division of the type categories is effectively guaranteed.

An optional technical solution, because the description category may be a category that includes a plurality of information characters in the sample information, classifying the field according to the sample information under the field may include: the method includes the steps of respectively obtaining the number of characters of information characters contained in each sample information under a field, determining whether the field belongs to a description category or not according to the number of the characters, and if the number of the characters exceeds a preset character number threshold value, dividing the field into the description categories, so that accurate division of the description categories is effectively guaranteed.

Another optional technical solution, converting the sample information under the field into information under the numerical value category according to the category classification result, may include: sample information under fields belonging to a date category is discretized in units of year, month, day, and time stamp to obtain information under a numerical category, whereby fields under 4 numerical categories can be derived based on fields under one date category.

Another optional technical solution, converting the sample information under the field into information under the numerical value category according to the category classification result, may include: and respectively coding all information types contained in all sample information under the fields belonging to the type types, and converting the sample information under the fields belonging to the type types into information under the numerical value types according to the coding result. For example, for the four information types of the plantation, finance, IT and education industries in the above example, they may be encoded with the unique numerical values respectively, thereby accomplishing an efficient conversion from the type category to the numerical category.

Another optional technical solution, converting the sample information under the field into information under the numerical value category according to the category classification result, may include: and mapping the sample information under the field belonging to the description category to a numerical value, such as mapping to a character string length, the occurrence frequency of the keywords and the like, thereby obtaining the information under the numerical value category. In practical application, optionally, if the classification accuracy of the model obtained by training without adding the sample information under the description category reaches the requirement, the sample information can be directly removed, so that the classification accuracy of the model is ensured, and the classification speed of the model can be ensured.

On the basis of any one of the above technical solutions, optionally, the field may also be verified to avoid generating noisy data and find out potential abnormal data. Illustratively, whether all fields have been processed, there are no storage types other than numpy.float64 and numpy.int64; whether a null value exists; whether a serial number is mixed in the sample information in the numerical type, such as identifying the identification number as a numerical value; if the information types are excessive, for example, the identity identifiers are divided into the information types, but the identity identifiers of each user are different from each other, which results in the excessive information types, then the first bits in the identity identifiers can be extracted, and the birth area reflected by the first bits is used as the information type; etc., and are not specifically limited herein.

Example four

Fig. 5 is a block diagram of an asset recovery rate classifying device according to a fourth embodiment of the present invention, which is configured to execute the asset recovery rate classifying method according to any of the embodiments. The device and the asset recovery rate classification method of each embodiment belong to the same inventive concept, and details which are not described in detail in the embodiment of the asset recovery rate classification device can refer to the embodiment of the asset recovery rate classification method. Referring to fig. 5, the apparatus may specifically include: an asset recovery classification model acquisition module 410 and a target classification result acquisition module 420.

The asset recovery rate classification model obtaining module 410 is configured to obtain attribute information of a target asset and a pre-trained asset recovery rate classification model;

a target classification result obtaining module 420, configured to input the attribute information into the asset recovery rate classification model, and obtain a target classification result of the asset recovery rate of the target asset according to an output result of the asset recovery rate classification model;

Optionally, the asset recovery rate classification model is obtained by pre-training the following modules:

the training sample obtaining module is used for obtaining sample information of the sample assets and sample classification results of asset recovery rates of the sample assets, and taking the sample information and the sample classification results as a group of training samples;

the asset recovery rate classification model obtaining module is used for training a sample classification model to be trained based on a plurality of groups of training samples to obtain an asset recovery rate classification model, wherein the sample classification model comprises a classification model constructed based on a lightweight gradient lifting frame.

On this basis, an optional training sample obtaining module may include:

the sample information acquisition unit is used for respectively acquiring sample information of the sample assets under each field;

the above asset recovery rate classification device may further include:

the field classification result obtaining module is used for carrying out field classification according to the sample information under the field aiming at each field to obtain the field classification result, wherein the field classification result comprises a numerical value type, a date type, a type or a description type;

and the first updating module of the sample information is used for converting the sample information under the field into the information under the numerical value category according to the category classification result and updating the sample information under the field based on the conversion result if the field is determined not to belong to the numerical value category according to the category classification result.

On this basis, optionally, the category classification result obtaining module may include:

the date type determining unit is used for acquiring the storage format of the sample information under the field and determining whether the field belongs to the numerical value type or the date type according to the storage format; and/or the presence of a gas in the gas,

the type determining unit is used for acquiring the type number of information types contained in all sample information under the field and determining whether the field belongs to the type according to the type number; and/or the presence of a gas in the gas,

and the description type determining unit is used for respectively acquiring the number of characters of information characters contained in each sample information under the field and determining whether the field belongs to the description type according to the number of the characters.

Still optionally, the first update module of sample information may include:

the date type conversion unit is used for dispersing the sample information under the field belonging to the date type by taking a year, a month, a day and a timestamp as units to obtain information under the numerical value type; and/or the presence of a gas in the gas,

the type conversion unit is used for respectively encoding all information types contained in all sample information under fields belonging to the type and converting the sample information under the fields belonging to the type into information under the numerical value type according to an encoding result; and/or the presence of a gas in the gas,

and the description type conversion unit is used for mapping the sample information under the field belonging to the description type into a numerical value to obtain the information under the numerical value type.

Still optionally, the asset recovery rate classifying device may further include:

the field determination module is used for determining a reduced number field and a reduced number field from fields belonging to the date category aiming at the fields belonging to the date category after the category classification result of the fields is obtained;

a sample information adding module for subtracting the sample information under the subtraction field from the sample information under the subtraction field, and taking the subtraction result as the sample information under the newly added field;

and the field type classification module is used for classifying the newly added field according to the sample information under the newly added field to obtain the type classification result of the newly added field.

Optionally, the asset recovery rate classifying device may further include:

the loss rate determining module is used for obtaining a plurality of groups of training samples after the sample information and the sample classification result are used as a group of training samples, and determining the loss rate of the field according to the sample information under the field aiming at each field in the plurality of groups of training samples;

the first field determining module is used for determining a first field from each field according to the missing rate of each field;

the training sample updating module is used for updating the obtained multiple groups of training samples based on the sample information in the first field;

and/or the presence of a gas in the gas,

the above asset recovery rate classification device may further include:

the importance obtaining module is used for obtaining the importance of each field according to the output result of the asset recovery rate classification model after the asset recovery rate classification model is obtained;

the second updating module of the sample information is used for determining a second field from all the fields according to the importance and updating the acquired sample information based on the sample information in the second field;

and the training sample obtaining module is used for carrying out the step of taking the sample information and the sample classification result as a group of training samples again.

the evaluation sample obtaining module is used for obtaining evaluation information of the evaluation assets and evaluation classification results of the asset recovery rates of the evaluation assets after the asset recovery rate classification models are obtained, and taking the evaluation information and the evaluation classification results as a group of evaluation samples;

and the evaluation result obtaining module is used for evaluating the obtained asset recovery rate classification model based on the multiple groups of evaluation samples and preset evaluation indexes to obtain an evaluation result of the asset recovery rate classification model.

On this basis, optionally, the evaluation index is expressed by the following formula:

wherein, F₁Is an evaluation index, P is the accuracy, and R is the recall.

Optionally, the asset recovery rate classification model obtaining module may include:

the asset recovery rate classification model acquisition unit is used for acquiring an asset recovery rate classification model which is trained in advance and corresponds to each future time point in a future time period;

the target classification result obtaining module may include:

the output result obtaining unit is used for inputting the attribute information into the asset recovery rate classification models according to each asset recovery rate classification model to obtain the output results of the asset recovery rate classification models;

and the target classification result obtaining unit is used for obtaining a target classification result of the asset recovery rate of the target asset at a future time point corresponding to the asset recovery rate classification model according to the output result.

On the basis, optionally, the asset recovery rate classification model corresponding to the future time point is obtained by pre-training through the following modules:

the historical classification result acquisition module is used for acquiring historical information of the historical assets and historical classification results of asset recovery rates of the historical assets at historical time points, wherein the positions of the historical time points in the historical time periods are the same as the positions of the future time points in the future time periods;

a historical sample obtaining module for taking the historical information and the historical classification result as a group of historical samples;

and the asset recovery rate classification model second obtaining module is used for training a historical classification model to be trained on the basis of a plurality of groups of historical samples to obtain an asset recovery rate classification model corresponding to a future time point, wherein the historical classification model comprises a classification model constructed on the basis of a lightweight gradient lifting framework.

Optionally, the attribute information includes at least one of: asset information of the target asset at the reception point, basic information and credit investigation information of an owner of the target asset, and management information in a management process of the target asset.

According to the asset recovery rate classification device provided by the fourth embodiment of the invention, the attribute information of the target asset and the pre-trained asset recovery rate classification model are obtained through the asset recovery rate classification model obtaining module, the asset recovery rate classification model can comprise a classification model constructed on the basis of a lightweight gradient lifting frame, and the asset recovery rate classification device is very suitable for processing a data sample with uneven distribution of asset recovery rate; further, the attribute information is input into the asset recovery rate classification model through the target classification result obtaining module, and according to the output result of the asset recovery rate classification model, a target classification result of asset recovery rate of the target asset at one or several future time points is obtained, wherein the target classification result may include that the asset recovery rate is or is not in a preset asset recovery rate interval, and the asset recovery rate interval includes a numerical value interval related to complete asset recovery and/or zero asset recovery. According to the device, the asset recovery rate classification model constructed based on the lightweight gradient lifting framework is applied to the application scene of asset recovery rate classification, the problem that the classification precision of the asset recovery rate is not high is solved, and the effect of accurately classifying the asset recovery rate is achieved.

The asset recovery rate classification device provided by the embodiment of the invention can execute the asset recovery rate classification method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

It should be noted that, in the embodiment of the asset recovery rate classification device, the included units and modules are only divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be realized; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

EXAMPLE five

Fig. 6 is a schematic structural diagram of an asset recovery rate classifying device according to a fifth embodiment of the present invention, and as shown in fig. 6, the device includes a memory 510, a processor 520, an input device 530, and an output device 540. The number of processors 520 in the device may be one or more, and one processor 520 is taken as an example in fig. 6; the memory 510, processor 520, input device 530, and output device 540 in the apparatus may be connected by a bus or other means, such as by bus 550 in fig. 6.

The memory 510 may be used as a computer-readable storage medium for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the asset recovery classification method in embodiments of the present invention (e.g., the asset recovery classification model acquisition module 410 and the target classification result acquisition module 420 in the asset recovery classification device). The processor 520 implements the asset recovery classification method described above by executing software programs, instructions, and modules stored in the memory 510 to perform various functional applications of the device and data processing.

The memory 510 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the device, and the like. Further, the memory 510 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 510 may further include memory located remotely from processor 520, which may be connected to devices through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 530 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the device. The output device 540 may include a display device such as a display screen.

EXAMPLE six

A sixth embodiment of the present invention provides a storage medium containing computer-executable instructions which, when executed by a computer processor, are operable to perform a method of asset recovery classification, the method comprising:

Of course, the embodiments of the present invention provide a storage medium containing computer-executable instructions, which are not limited to the operations of the method described above, but can also perform related operations in the asset recovery rate classification method provided by any embodiments of the present invention.

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. With this understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. An asset recovery rate classification method, comprising:

inputting the attribute information into the asset recovery rate classification model, and obtaining a target classification result of the asset recovery rate of the target asset according to an output result of the asset recovery rate classification model;

2. The method of claim 1, wherein the asset recovery classification model is pre-trained by:

acquiring sample information of sample assets and sample classification results of asset recovery rates of the sample assets, and taking the sample information and the sample classification results as a group of training samples;

and training a sample classification model to be trained based on a plurality of groups of training samples to obtain the asset recovery rate classification model, wherein the sample classification model comprises a classification model constructed based on the lightweight gradient lifting frame.

3. The method of claim 2, wherein obtaining sample information for a sample asset comprises: respectively obtaining sample information of sample assets under each field;

the method further comprises the following steps:

for each field, carrying out category classification on the field according to the sample information under the field to obtain a category classification result of the field, wherein the category classification result comprises a numerical value category, a date category, a type category or a description category;

and if the field is determined not to belong to the numerical value category according to the category classification result, converting the sample information under the field into the information under the numerical value category according to the category classification result, and updating the sample information under the field based on the conversion result.

4. The method of claim 3, wherein the categorizing the field according to the sample information under the field comprises:

acquiring a storage format of the sample information under the field, and determining whether the field belongs to the numerical value category or the date category according to the storage format; and/or the presence of a gas in the gas,

acquiring the type number of information types contained in all the sample information under the field, and determining whether the field belongs to the type category according to the type number; and/or the presence of a gas in the gas,

and respectively acquiring the character number of information characters contained in each sample information under the field, and determining whether the field belongs to the description type according to the character number.

5. The method according to claim 3, wherein the converting the sample information under the field into the information under the numerical category according to the category classification result comprises:

dispersing the sample information under the field belonging to the date category by taking a year, a month, a day and a timestamp as units to obtain information under the numerical value category; and/or the presence of a gas in the gas,

respectively encoding all information types contained in all the sample information under the fields belonging to the type category, and converting the sample information under the fields belonging to the type category into information under the numerical value category according to an encoding result; and/or the presence of a gas in the gas,

and mapping the sample information under the field belonging to the description type into a numerical value to obtain the information under the numerical value type.

6. The method of claim 3, further comprising, after the obtaining the classification result of the category of the field:

determining, for each of the fields belonging to the date category, a decremented field and a decremented field from each of the fields belonging to the date category;

subtracting the sample information under the number-reduced section from the sample information under the number-reduced section, and taking the subtraction result as the newly added sample information under the section;

and according to the sample information under the newly added field, carrying out class classification on the newly added field to obtain a class classification result of the newly added field.

7. The method of claim 2, wherein after the using the sample information and the sample classification result as a set of training samples, further comprising:

acquiring a plurality of groups of training samples, and determining the missing rate of each field in the plurality of groups of training samples according to the sample information under the field;

determining a first field from each field according to the missing rate of each field;

updating the acquired multiple groups of training samples based on the sample information in the first field;

and/or the presence of a gas in the gas,

after the obtaining the asset recovery rate classification model, further comprising:

obtaining the importance of each field according to the output result of the asset recovery rate classification model;

determining a second field from each field according to the importance, and updating the acquired sample information based on the sample information in the second field;

and then executing the step of using the sample information and the sample classification result as a group of training samples.

8. The method of claim 2, further comprising, after said deriving said asset recovery classification model:

acquiring assessment information of assessment assets and assessment classification results of asset recovery rates of the assessment assets, and taking the assessment information and the assessment classification results as a group of assessment samples;

and evaluating the obtained asset recovery rate classification model based on a plurality of groups of evaluation samples and preset evaluation indexes to obtain an evaluation result of the asset recovery rate classification model.

9. The method according to claim 8, wherein the evaluation index is represented by the following equation:

wherein, F₁Is the evaluation index, P is the precision rate, and R is the recall rate.

10. The method of claim 1, wherein said obtaining a pre-trained asset recovery classification model comprises: acquiring an asset recovery rate classification model which is trained in advance and corresponds to each future time point in a future time period;

the step of inputting the attribute information into the asset recovery rate classification model, and obtaining a target classification result of the asset recovery rate of the target asset according to an output result of the asset recovery rate classification model includes:

inputting the attribute information into the asset recovery rate classification model aiming at each asset recovery rate classification model to obtain an output result of the asset recovery rate classification model;

and according to the output result, obtaining a target classification result of the asset recovery rate of the target asset at the future time point corresponding to the asset recovery rate classification model.

11. The method of claim 10 wherein, for each of said future time points, said asset recovery classification model corresponding to said future time point is pre-trained by:

acquiring historical information of historical assets and historical classification results of asset recovery rates of the historical assets at historical time points, wherein the positions of the historical time points in a historical time period are the same as the positions of the future time points in the future time period;

taking the historical information and the historical classification result as a group of historical samples;

training a historical classification model to be trained based on a plurality of groups of historical samples to obtain the asset recovery rate classification model corresponding to the future time point, wherein the historical classification model comprises a classification model constructed based on the lightweight gradient lifting framework.

12. The method of claim 1, wherein the attribute information comprises at least one of: the asset information of the target asset at the receiving point, the basic information and credit investigation information of the owner of the target asset, and the management information in the management process of the target asset.

13. An asset recovery rate sorting device, comprising:

a target classification result obtaining module, configured to input the attribute information into the asset recovery rate classification model, and obtain a target classification result of the asset recovery rate of the target asset according to an output result of the asset recovery rate classification model;

14. An asset recovery rate sorting device, comprising:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the asset recovery classification method of any one of claims 1-12.

15. A computer-readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the asset recovery classification method of any of claims 1-12.

16. A computer program product comprising a computer program which, when executed by a processor, implements the asset recovery classification method of any one of claims 1 to 12.