CN114692667B

CN114692667B - Model training method and related device

Info

Publication number: CN114692667B
Application number: CN202011623266.3A
Authority: CN
Inventors: 付明亮; 徐羽琼; 叶飞; 周振坤
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2025-06-10
Anticipated expiration: 2040-12-30
Also published as: CN114692667A

Abstract

The application discloses a model training method which can be applied to the field of artificial intelligence. The method comprises the steps of obtaining a sample pair comprising noise data and noiseless data, inputting the noiseless data into a noise reduction network and a noise reduction classification network of the classification network to obtain first output data output by the noise reduction network and second output data output by the classification network, and inputting the noise data into the noise reduction classification network to obtain third output data output by a middle layer of the noise reduction network and fourth output data output by the classification network. The method comprises the steps of determining a first loss function according to first output data and third output data, determining a second loss function according to second output data and fourth output data, training a noise reduction classification network at least according to the first loss function and the second loss function until preset training conditions are met, and obtaining a target network. The scheme can enhance the capability of the network for inhibiting local noise and disturbance, and improves the accuracy of classification and identification of the network.

Description

Model training method and related device

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a model training method and a related device.

Background

Artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, perceives the environment, obtains knowledge, and uses the knowledge to obtain optimal results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar manner to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

Sparse time series data is a set of non-dense data sequences arranged in chronological order. Common sparse timing data includes, for example, human skeletal keypoint data, electrocardiogram data, and Inertial Measurement Unit (IMU) data. By classifying and identifying the sparse time series data, useful information can be obtained. For example, based on human skeleton key point data, the gesture motion of the human body can be identified, based on electrocardiogram data, the physical condition of the human body can be diagnosed, and based on IMU data on the wearable equipment, the motion state of the human body can be identified.

In a real environment, sparse time sequence data is often interfered by various noises, so that the sparse time sequence data acquired by equipment comprises noise. Based on this, in the related art, after the sparse time series data is usually denoised, classification and recognition are performed on the denoised sparse time series data based on the original classification method.

However, the classification recognition accuracy of noisy sparse time series data in the related art is low, and it is difficult to ensure the normal recognition of the sparse time series data. Therefore, a method capable of effectively classifying and identifying noisy sparse time series data is needed.

Disclosure of Invention

The application provides a model training method and a related device, wherein in the process of training a noise reduction classification network, a first loss function is obtained based on the output of noise-free data in the noise reduction network of the noise reduction classification network and the output of noise data in the middle layer of the noise reduction network, a second loss function is obtained based on the output of noise-free data and noise data in the whole noise reduction classification network, and the noise reduction classification network is trained based on the first loss function and the second loss function so as to obtain a target network. The noise reduction classification network is trained, and the loss function is obtained based on the noise-free data and the output of the noise data in the noise reduction network and the output of the noise reduction classification network, so that the consistency of the noise reduction target and the classification precision target can be ensured, the network can learn more complete global characteristics in the noise reduction stage, the local noise and disturbance inhibition capability of the network is enhanced, and the classification recognition precision of the network is improved.

The first aspect of the application provides a model training method, which comprises the steps that a terminal obtains a sample pair from a sample set comprising a plurality of sample pairs, wherein the sample pair comprises noise data and noise-free data corresponding to the noise data, namely the noise data in the sample pair is noise-free data corresponding to the noise data after noise is removed. The terminal inputs the noiseless data into a noise reduction classification network to obtain first output data and second output data. The noise reduction and classification network comprises a noise reduction network and a classification network, wherein the first output data is the output of the noise reduction network, and the second output data is the output of the classification network. And the terminal inputs the noise data into a noise reduction classification network to obtain third output data and fourth output data. The third output data is obtained based on the middle layer of the noise reduction network, and the fourth output data is the output of the classification network.

The terminal then determines a first loss function from the first output data and the third output data, the first loss function representing a difference between the first output data and the third output data. The noise reduction classification network is trained based on the first loss function, so that the noise reduction classification network can learn global features which are closer to the noise-free data in the noise reduction processing stage, and the capacity of the noise reduction network for inhibiting local noise and disturbance is enhanced.

And secondly, the terminal determines a second loss function according to the second output data and the fourth output data, wherein the second loss function is used for representing the difference between the second output data and the real class labels of the fourth output data and the noiseless data. For example, the second loss function is obtained by summing the first difference value of the real class label of the second output data and the noiseless data and the second difference value of the real class label of the fourth output data and the noiseless data.

And finally, training the noise reduction classification network by the terminal at least according to the first loss function and the second loss function until the preset training condition is met, and obtaining the target network. Specifically, the terminal may calculate a total loss function based on the first loss function and the second loss function, and the terminal trains the noise reduction classification network based on the total loss function, where the total loss function may be a sum of the first loss function and the second loss function, and the total loss function may be obtained by adding a product of the first loss function and the first scaling factor to a product of the second loss function and the second scaling factor.

According to the scheme, the noise reduction classification network is trained, the loss function is obtained based on the noise-free data and the output of the noise data in the noise reduction network and the output of the noise reduction classification network, the consistency of the noise reduction target and the classification precision target can be ensured, the network can learn more complete global characteristics in the noise reduction stage, the local noise and disturbance suppression capability of the network is enhanced, and the classification recognition precision of the network is improved.

Optionally, in one possible implementation, the noise reduction network is a self-encoder, which includes an encoder and a decoder. The self-encoder is also called an automatic encoder, is an artificial neural network, and can learn efficient representation of input data through unsupervised learning. In practical applications, the self-encoder may be enabled to implement noise reduction processing on noise data by adding constraints to the self-encoder. In the self-encoder, the encoder is used for compression encoding input data, the decoder is used for data reconstruction of data output by the encoder, the first output data is output by the encoder, and the third output data is obtained based on an intermediate layer of the encoder. By using a self-encoder comprising an encoder and a decoder as a noise reduction network, modifications to the prior art can be reduced and the practicality of the scheme improved.

Optionally, in a possible implementation manner, the terminal inputs the noise data into the noise reduction classification network to obtain third output data, including that the terminal inputs the noise data into the noise reduction classification network to obtain the characteristic data of the middle layer output of the noise reduction network. The terminal divides the characteristic data into a plurality of sub-characteristic data to obtain third output data, wherein the third output data comprises the plurality of sub-characteristic data. The terminal determines a difference value between each sub-feature data in the third output data and the first output data. The terminal determines a first loss function according to the difference value between each piece of sub-feature data and the first output data.

In the scheme, the characteristic data output by the noise data in the middle layer of the noise reduction network is divided into a plurality of sub-characteristic data, and the loss function is established based on the sub-characteristic data and the first output data, so that the noise reduction network can be guided to learn rich global information, and the noise reduction effect of the noise reduction network is improved.

Optionally, in one possible implementation manner, the terminal uniformly divides the feature data into a plurality of sub-feature data according to a time sequence to obtain third output data, where a time period corresponding to each of the plurality of sub-feature data is the same in length, and the noise data is time sequence data.

Because the time sequence data are coherent and adjacent time sequence data have certain relevance, a loss function is constructed based on local time sequence characteristics corresponding to noise data and global time sequence characteristics corresponding to noise-free data, and the noise reduction network can be guided to learn richer global time sequence information, so that the capability of the noise reduction network for inhibiting local noise and disturbance is enhanced.

Optionally, in one possible implementation, since the first output data is data output by an encoder in the noise reduction network and the third output data is data output by an intermediate layer of the encoder of the noise reduction network, the dimensions of the two are not the same. Therefore, before the first loss function is obtained, the first output data and the third output data may be subjected to a dimension alignment operation so that the dimensions of the first output data and the third output data are the same, and then the difference value between the first output data and the third output data is obtained.

The method comprises the steps that the terminal executes dimension alignment operation on each piece of sub-feature data in the first output data and each piece of sub-feature data in the third output data respectively to obtain the first output data and the third output data with aligned dimensions. The terminal determines a difference value between each piece of sub-feature data in the third output data with aligned dimensions and the first output data with aligned dimensions. In practical application, the terminal may construct a plurality of dimension alignment sub-networks in advance, and input the first output data into one of the dimension alignment sub-networks, and input each piece of sub-feature data in the third output data into other corresponding dimension alignment sub-networks, so as to obtain the first output data and the third output data with aligned dimensions.

Optionally, in a possible implementation, the terminal determines the second loss function according to the second output data and the fourth output data, including that the terminal determines a difference between the second output data and a real class label of the noiseless data, to obtain a first difference value. And the terminal determines the difference between the fourth output data and the real class label of the noiseless data to obtain a second difference value. And the terminal acquires a second loss function according to the first difference value and the second difference value. The second output data is a multi-classification prediction result and is used for representing a classification network prediction result.

Optionally, in a possible implementation manner, during the training process of the noise reduction classification network, a binary classifier may be further introduced, where the binary classifier can perform a classification prediction on the data input into the noise reduction classification network based on the features extracted by the noise reduction classification network, that is, predict whether the data input into the noise reduction classification network is noise data or noise-free data. Then, based on the two classification results output by the binary classifier and the real two classification labels corresponding to the input data, the terminal determines a third loss function, wherein the third loss function is used for solving the total loss function together with the first loss function and the second loss function, namely, the third loss function is also used for training the noise reduction classification network.

The method comprises the steps that a terminal obtains a first characteristic, predicts a classification result corresponding to noise-free data according to the first characteristic, and obtains a first prediction result. Wherein the first feature is extracted by a classification network in the noise reduction classification network after the noise-free data is input into the noise reduction classification network. And the terminal acquires the second characteristic, predicts a classification result corresponding to the noise data according to the second characteristic, and obtains a second prediction result. The second feature is extracted by a classification network in the noise reduction classification network after the noise data is input into the noise reduction classification network. And the terminal determines a third loss function according to the first prediction result, the real two-class label of the noiseless data, the second prediction result and the real two-class label of the noise data. The terminal trains the noise reduction classification network according to at least the first, second and third loss functions. The classification result corresponding to the noise-free data is a noise-free type or a noise type, and the classification result corresponding to the noise data is a noise-free type or a noise type.

According to the scheme, a binary classifier is introduced in a training stage, and based on the characteristics extracted by the classification network in the noise reduction classification network, the binary classifier predicts the classification result of the input data, and a loss function corresponding to the classification result is obtained. By introducing the loss function corresponding to the classification result based on the original loss function, an additional evaluation dimension can be introduced, so that the trained noise reduction classification network can have adaptive noise reduction classification scales for different types of input data, and the classification precision of the noise reduction classification network is improved.

Optionally, in one possible implementation, the terminal trains the noise reduction classification network according to at least the first loss function and the second loss function, including updating parameters of the noise reduction classification network by an error back propagation algorithm according to at least the first loss function and the second loss function. In short, the terminal can correct the parameter in the initial noise reduction classification network in the training process of the noise reduction classification network through an error back propagation algorithm, so that the reconstruction error loss of the noise reduction classification network is smaller and smaller. Specifically, the input signal is forwarded until the output generates error loss, and the parameters in the initial noise reduction classification network are updated through back propagation of error loss information, so that the error loss is converged.

Optionally, in one possible implementation, the noise data in the sample pair comprises sparse timing data comprising skeletal point coordinate data, electrocardiogram data, inertial measurement unit data, or fault diagnosis data.

The second aspect of the application provides a noise reduction and classification method, which comprises the steps of obtaining data to be classified, inputting the data to be classified into a target network to obtain a prediction result, wherein the prediction result is a classification result of the data to be classified, the target network is used for performing noise reduction treatment and classification on the data to be classified, and the target network is trained based on the method of the first aspect.

The third aspect of the application provides a model training device, which comprises an acquisition unit and a processing unit. The system comprises an acquisition unit, a processing unit, a training unit and a training unit, wherein the acquisition unit is used for acquiring a sample pair, the sample pair comprises noise data and noise-free data corresponding to the noise data, the processing unit is used for inputting the noise-free data into a noise-reduction classification network to obtain first output data and second output data, the noise-reduction classification network comprises a noise-reduction network and a classification network, the first output data is the output of the noise-reduction network, the second output data is the output of the classification network, the processing unit is also used for inputting the noise data into the noise-reduction classification network to obtain third output data and fourth output data, the third output data is the output of the classification network, the fourth output data is the intermediate layer based on the noise-reduction network, the processing unit is also used for determining a first loss function according to the first output data and the third output data, the first loss function is used for representing the difference between the first output data and the third output data, the processing unit is the second output of the classification network, the processing unit is also used for determining a third loss function according to the second output data and the fourth output function, the second loss function is used for obtaining the training function is used for obtaining the difference between the second output function and the training function and the fourth output function is used for obtaining the training function.

Optionally, in a possible implementation manner, the noise reduction network includes an encoder and a decoder, the encoder is used for performing compression encoding on input data, the decoder is used for performing data reconstruction on data output by the encoder, the first output data is output by the encoder, and the third output data is obtained based on an intermediate layer of the encoder.

Optionally, the processing unit is further configured to input the noise data into the noise reduction classification network to obtain feature data output by an intermediate layer of the noise reduction network, divide the feature data into a plurality of sub-feature data to obtain the third output data, where the third output data includes the plurality of sub-feature data, determine a difference value between each sub-feature data in the third output data and the first output data, and determine the first loss function according to the difference value between each sub-feature data and the first output data.

Optionally, in one possible implementation manner, the processing unit is further configured to divide the feature data into a plurality of sub-feature data uniformly according to a time sequence, so as to obtain the third output data, where a time period corresponding to each sub-feature data in the plurality of sub-feature data is the same in length, and the noise data is time sequence data.

Optionally, in a possible implementation manner, the processing unit is further configured to perform a dimension alignment operation on each piece of sub-feature data in the first output data and the third output data, so as to obtain first output data and third output data with aligned dimensions. A difference value between each sub-feature data in the third dimension-aligned output data and the first dimension-aligned output data is determined.

Optionally, in one possible implementation manner, the processing unit is further configured to determine a difference between the second output data and a real class label of the noiseless data to obtain a first difference value, determine a difference between the fourth output data and the real class label of the noiseless data to obtain a second difference value, and obtain the second loss function according to the first difference value and the second difference value, where the second output data is a multi-classification prediction result and is used to represent a result of the classification network prediction.

Optionally, in one possible implementation manner, the obtaining unit is further configured to obtain a first feature, predict a classification result corresponding to the noise-free data according to the first feature, obtain a first prediction result, where the first feature is extracted by a classification network in the noise-free classification network after the noise-free data is input into the noise-reduction classification network, obtain a second feature, predict a classification result corresponding to the noise data according to the second feature, obtain a second prediction result, where the second feature is extracted by a classification network in the noise-reduction classification network after the noise data is input into the noise-reduction classification network, and determine a third loss function according to the first prediction result and a true two-classification label of the noise-free data, the second prediction result and a true two-classification label of the noise data, and further be configured to train at least the first loss function, the second loss function and the third loss function, where the noise-free data is of a corresponding type or a noise-free type.

Optionally, in a possible implementation, the parameters of the noise reduction classification network are updated by an error back propagation algorithm at least according to the first loss function and the second loss function.

Optionally, in one possible implementation, the noise data includes sparse timing data.

Optionally, in one possible implementation, the sparse timing data includes skeletal point coordinate data, electrocardiogram data, inertial measurement unit data, or fault diagnosis data.

A fourth aspect of the present application provides a noise reduction classification apparatus, including an acquisition unit and a processing unit. The acquisition unit is used for acquiring the data to be classified. The processing unit is used for inputting the data to be classified into a target network to obtain a prediction result, wherein the prediction result is a classification result of the data to be classified, the target network is used for carrying out noise reduction processing and classification on the data to be classified, and the target network is trained based on the method of the first aspect.

A fifth aspect of the application provides a model training apparatus, which may comprise a processor coupled to a memory, the memory storing program instructions which, when executed by the processor, implement the method of the first aspect described above. For the steps in each possible implementation manner of the first aspect executed by the processor, reference may be specifically made to the first aspect, which is not described herein.

A sixth aspect of the present application provides a noise reduction classification device, which may comprise a processor coupled to a memory, the memory storing program instructions which, when executed by the processor, implement the method of the second aspect. For the steps in each possible implementation manner of the second aspect executed by the processor, reference may be specifically made to the second aspect, which is not described herein.

A seventh aspect of the present application provides a computer readable storage medium having a computer program stored therein, which when run on a computer causes the computer to perform the method of the first or second aspect described above.

An eighth aspect of the application provides a computer program product having a computer program stored therein, which when run on a computer causes the computer to perform the method of the first or second aspect described above.

A ninth aspect of the application provides circuitry comprising processing circuitry configured to perform the method of the first or second aspect described above.

A tenth aspect of the application provides a chip comprising one or more processors. Some or all of the processor is configured to read and execute a computer program stored in the memory to perform the method in any of the possible implementations of any of the aspects described above. Optionally, the chip includes a memory, and the memory and the processor are connected to the memory through a circuit or a wire. Optionally, the chip further comprises a communication interface, and the processor is connected to the communication interface. The communication interface is used for receiving data and/or information to be processed, and the processor acquires the data and/or information from the communication interface, processes the data and/or information and outputs a processing result through the communication interface. The communication interface may be an input-output interface. The method provided by the application can be realized by one chip or a plurality of chips in a cooperative manner.

Drawings

FIG. 1 is a schematic diagram of an artificial intelligence main body framework according to an embodiment of the present application;

FIG. 2a is a diagram of a data processing system according to an embodiment of the present application;

FIG. 2b is a schematic diagram of another data processing system according to an embodiment of the present application;

FIG. 2c is a schematic diagram of a related device for data processing according to an embodiment of the present application;

FIG. 3a is a schematic diagram of a system 100 architecture according to an embodiment of the present application;

fig. 3b is a schematic diagram of an application scenario provided in an embodiment of the present application;

FIG. 3c is a schematic diagram of an embodiment of the present application for applying time series data;

FIG. 4 is a schematic flow chart of a model training method according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a noise reduction classification network according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a local-global feature correlation module and a hybrid classifier according to an embodiment of the present application;

Fig. 7 is a schematic flow chart of training a noise reduction classification network according to an embodiment of the present application;

FIG. 8 is a schematic diagram of generating noise data according to an embodiment of the present application;

FIG. 9a is a schematic flow chart of constructing local-global feature association loss according to an embodiment of the present application;

FIG. 9b is a schematic flow chart of constructing local-global feature association loss and hybrid classification loss according to an embodiment of the present application;

FIG. 10 is a schematic diagram comparing a prior art scheme provided by an embodiment of the present application with a scheme of the present application;

FIG. 11 is a schematic structural diagram of a model training device according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of a noise reduction classification device according to an embodiment of the present application;

Fig. 13 is a schematic structural diagram of an execution device according to an embodiment of the present application;

FIG. 14 is a schematic structural view of a training device according to an embodiment of the present application;

fig. 15 is a schematic structural diagram of a chip according to an embodiment of the present application.

Detailed Description

Embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention. The terminology used in the description of the embodiments of the invention herein is for the purpose of describing particular embodiments of the invention only and is not intended to be limiting of the invention.

Embodiments of the present application are described below with reference to the accompanying drawings. As one of ordinary skill in the art can know, with the development of technology and the appearance of new scenes, the technical scheme provided by the embodiment of the application is also applicable to similar technical problems.

The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely illustrative of the manner in which embodiments of the application have been described in connection with the description of the objects having the same attributes. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Referring to fig. 1, a schematic structural diagram of an artificial intelligence main body framework is shown in fig. 1, and the artificial intelligence main body framework is described below from two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis). Where the "intelligent information chain" reflects a list of processes from the acquisition of data to the processing. For example, there may be general procedures of intelligent information awareness, intelligent information representation and formation, intelligent reasoning, intelligent decision making, intelligent execution and output. In this process, the data undergoes a "data-information-knowledge-wisdom" gel process. The "IT value chain" reflects the value that artificial intelligence brings to the information technology industry from the underlying infrastructure of personal intelligence, information (provisioning and processing technology implementation), to the industrial ecological process of the system.

(1) Infrastructure of

The infrastructure provides computing capability support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the base platform. The system is communicated with the outside through the sensor, the computing capacity is provided by an intelligent chip (CPU, NPU, GPU, ASIC, FPGA and other hardware acceleration chips), and the basic platform comprises a distributed computing framework, network and other relevant platform guarantees and supports, which can comprise cloud storage, computing, interconnection network and the like. For example, the sensor and external communication obtains data that is provided to a smart chip in a distributed computing system provided by the base platform for computation.

(2) Data

The data of the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence. The data relate to graphics, images, voice and text, and also relate to the internet of things data of the traditional equipment, including service data of the existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.

(3) Data processing

Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.

Wherein machine learning and deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.

Reasoning refers to the process of simulating human intelligent reasoning modes in a computer or an intelligent system, and carrying out machine thinking and problem solving by using formal information according to a reasoning control strategy, and typical functions are searching and matching.

Decision making refers to the process of making decisions after intelligent information is inferred, and generally provides functions of classification, sequencing, prediction and the like.

(4) General capability

After the data has been processed, some general-purpose capabilities can be formed based on the result of the data processing, such as algorithms or a general-purpose system, for example, translation, text analysis, computer vision processing, speech recognition, image recognition, etc.

(5) Intelligent product and industry application

The intelligent product and industry application refers to the product and application of the artificial intelligent system in various fields, which is the encapsulation of the whole artificial intelligent solution, and the intelligent information decision is produced to realize the floor application, and the application fields mainly comprise intelligent terminals, intelligent traffic, intelligent medical treatment, automatic driving, safe cities and the like.

Next, several application scenarios of the present application are described.

FIG. 2a is a schematic diagram of a data processing system according to an embodiment of the present application, where the data processing system includes a user device and a data processing device. The user equipment comprises intelligent terminals such as a mobile phone, a personal computer or an information processing center. The user equipment is an initiating terminal of data processing, and is used as an initiating terminal of the data noise reduction classification request, and the user usually initiates the request through the user equipment.

The data processing device may be a device or a server having a data processing function, such as a cloud server, a web server, an application server, and a management server. The data processing equipment receives the data noise reduction classification request from the intelligent terminal through the interactive interface, and then performs data processing in the modes of machine learning, deep learning, searching, reasoning, decision making and the like through a memory for storing data and a processor link for data processing. The memory in the data processing device may be a generic term comprising a database storing the history data locally, either on the data processing device or on another network server.

In the data processing system shown in fig. 2a, the user device may receive an instruction from a user, for example, the user device may obtain a set of data input/selected by the user, and then initiate a request to the data processing device, so that the data processing device performs a data noise reduction classification application on the data obtained by the user device, thereby obtaining a corresponding processing result for the data. In fig. 2a, a data processing device may perform the model training method of an embodiment of the present application.

Fig. 2b is a schematic diagram of another data processing system according to an embodiment of the present application, in fig. 2b, a user device directly serves as a data processing device, and the user device can directly obtain an input from a user and directly process the input by hardware of the user device, and a specific process is similar to that of fig. 2a, and reference is made to the above description and will not be repeated here.

In the data processing system shown in fig. 2b, the user device may receive an instruction from the user, for example, the user device may obtain a piece of data selected by the user in the user device, and then the user device itself executes a data processing application on the data, so as to obtain a corresponding processing result for the data.

In fig. 2b, the user equipment itself may perform the model training method according to the embodiment of the present application.

Fig. 2c is a schematic diagram of a related device for data processing according to an embodiment of the present application.

The user device in fig. 2a and 2b may be the local device 301 or the local device 302 in fig. 2c, and the data processing device in fig. 2a may be the executing device 210 in fig. 2c, where the data storage system 250 may store data to be processed of the executing device 210, and the data storage system 250 may be integrated on the executing device 210, or may be disposed on a cloud or other network server.

The processors in fig. 2a and 2b may perform data training/machine learning/deep learning through a neural network model or other models (e.g., a model based on a support vector machine), and perform data processing application on the data using the data final training or the learned model, thereby obtaining corresponding processing results.

Fig. 3a is a schematic diagram of a system 100 architecture provided by an embodiment of the present application, in fig. 3a, an execution device 110 configures an input/output (I/O) interface 112 for data interaction with an external device, and a user may input data to the I/O interface 112 through a client device 140, where the input data may include tasks to be scheduled, callable resources, and other parameters in an embodiment of the present application.

In the preprocessing of the input data by the execution device 110, or in the process of performing a processing related to computation or the like (for example, performing a functional implementation of a neural network in the present application) by the computation module 111 of the execution device 110, the execution device 110 may call the data, the code or the like in the data storage system 150 for the corresponding processing, or may store the data, the instruction or the like obtained by the corresponding processing in the data storage system 150.

Finally, the I/O interface 112 returns the processing results to the client device 140 for presentation to the user.

It should be noted that the training device 120 may generate, based on different training data, a corresponding target model/rule for different targets or different tasks, where the corresponding target model/rule may be used to achieve the targets or complete the tasks, thereby providing the user with the desired result. Wherein the training data may be stored in database 130 and derived from training samples collected by data collection device 160.

In the case shown in fig. 3a, the user may manually give input data, which may be operated through an interface provided by the I/O interface 112. In another case, the client device 140 may automatically send the input data to the I/O interface 112, and if the client device 140 is required to automatically send the input data requiring the user's authorization, the user may set the corresponding permissions in the client device 140. The user may view the results output by the execution device 110 at the client device 140, and the specific presentation may be in the form of a display, a sound, an action, or the like. The client device 140 may also be used as a data collection terminal to collect input data of the input I/O interface 112 and output results of the output I/O interface 112 as new sample data as shown in the figure, and store the new sample data in the database 130. Of course, instead of being collected by the client device 140, the I/O interface 112 may directly store the input data input to the I/O interface 112 and the output result output from the I/O interface 112 as new sample data into the database 130.

It should be noted that fig. 3a is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among devices, apparatuses, modules, etc. shown in the drawing is not limited in any way, for example, in fig. 3a, the data storage system 150 is an external memory with respect to the execution device 110, and in other cases, the data storage system 150 may be disposed in the execution device 110. As shown in fig. 3a, the neural network may be trained in accordance with the training device 120.

The embodiment of the application also provides a chip, which comprises the NPU. The chip may be provided in an execution device 110 as shown in fig. 3a for performing the calculation of the calculation module 111. The chip may also be provided in the training device 120 as shown in fig. 3a for completing the training work of the training device 120 and outputting the target model/rule.

The neural network processor NPU, NPU is mounted as a coprocessor to a main central processing unit (central processing unit, CPU) (host CPU) which distributes tasks. The core part of the NPU is an operation circuit, and the controller controls the operation circuit to extract data in a memory (a weight memory or an input memory) and perform operation.

In some implementations, the arithmetic circuitry includes a plurality of processing units (PEs) internally. In some implementations, the operational circuit is a two-dimensional systolic array. The arithmetic circuitry may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, the operational circuitry is a general-purpose matrix processor.

For example, assume that there is an input matrix a, a weight matrix B, and an output matrix C. The arithmetic circuit takes the data corresponding to the matrix B from the weight memory and caches the data on each PE in the arithmetic circuit. The operation circuit takes the matrix A data and the matrix B from the input memory to perform matrix operation, and the obtained partial result or the final result of the matrix is stored in an accumulator (accumulator).

The vector calculation unit may further process the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, etc. For example, the vector calculation unit may be used for network calculations of non-convolutional/non-FC layers in a neural network, such as pooling (pooling), batch normalization (batch normalization), local response normalization (local response normalization), and the like.

In some implementations, the vector computation unit can store the vector of processed outputs to a unified buffer. For example, the vector calculation unit may apply a nonlinear function to an output of the arithmetic circuit, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit generates a normalized value, a combined value, or both. In some implementations, the vector of processed outputs can be used as an activation input to an arithmetic circuit, for example for use in subsequent layers in a neural network.

The unified memory is used for storing input data and output data.

The weight data is transferred to the input memory and/or the unified memory directly by the memory cell access controller (direct memory access controller, DMAC), the weight data in the external memory is stored in the weight memory, and the data in the unified memory is stored in the external memory.

And the bus interface unit (bus interface unit, BIU) is used for realizing interaction among the main CPU, the DMAC and the instruction fetch memory through a bus.

The instruction fetching memory (instruction fetch buffer) is connected with the controller and used for storing instructions used by the controller;

and the controller is used for calling the instruction which refers to the cache in the memory and controlling the working process of the operation accelerator.

Typically, the unified memory, input memory, weight memory, and finger memory are On-Chip (On-Chip) memories, and the external memory is a memory external to the NPU, which may be a double data rate synchronous dynamic random access memory (double data rate synchronous dynamic random access memory, DDR SDRAM), high bandwidth memory (high bandwidth memory, HBM), or other readable and writable memory.

Because the embodiments of the present application relate to a large number of applications of neural networks, for convenience of understanding, related terms and related concepts of the neural networks related to the embodiments of the present application will be described below.

(1) Neural network

The neural network may be composed of neural units, which may refer to an arithmetic unit having xs and intercept 1 as inputs, and the output of the arithmetic unit may be:

Wherein, s=1, 2, &....n, n is a natural number greater than 1, ws is the weight of xs and b is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit to an output signal. The output signal of the activation function may be used as an input to the next convolutional layer. The activation function may be a sigmoid function. A neural network is a network formed by joining together a number of the above-described single neural units, i.e., the output of one neural unit may be the input of another. The input of each neural unit may be connected to a local receptive field of a previous layer to extract features of the local receptive field, which may be an area composed of several neural units.

The operation of each layer in the neural network may be expressed mathematicallyTo describe, the operation of each layer in the physical layer neural network can be understood as the transformation of the input space into the output space (i.e., the row space to the column space of the matrix) is accomplished by five operations of 1, up/down dimension, 2, up/down, 3, rotation, 4, translation, 5, "bending". Wherein the operations of 1,2 and 3 are as followsThe operation of 4 is completed by +b, and the operation of 5 is implemented by a (). The term "space" is used herein to describe two words because the object being classified is not a single thing, but rather a class of things, space referring to the collection of all individuals of such things. Where W is a weight vector, each value in the vector representing a weight value of a neuron in the layer neural network. The vector W determines the spatial transformation of the input space into the output space described above, i.e. the weights W of each layer control how the space is transformed. The purpose of training the neural network is to finally obtain a weight matrix (a weight matrix formed by a plurality of layers of vectors W) of all layers of the trained neural network. Thus, the training process of the neural network is essentially a way to learn and control the spatial transformation, and more specifically to learn the weight matrix.

Since it is desirable that the output of the neural network is as close as possible to the value actually desired, the weight vector of each layer of the neural network can be updated by comparing the predicted value of the current network with the actually desired target value and then according to the difference between the two (of course, there is usually an initialization process before the first update, that is, the pre-configuration parameters of each layer in the neural network), for example, if the predicted value of the network is higher, the weight vector is adjusted to be predicted to be lower, and the adjustment is continued until the neural network can predict the actually desired target value. Thus, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which is a loss function (loss function) or an objective function (objective function), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, the higher the output value (loss) of the loss function is, the larger the difference is, and the training of the neural network becomes the process of reducing the loss as much as possible.

(2) Back propagation algorithm

The neural network can adopt a Back Propagation (BP) algorithm to correct the parameter in the initial neural network model in the training process, so that the reconstruction error loss of the neural network model is smaller and smaller. Specifically, the input signal is transmitted forward until the output is generated with error loss, and the parameters in the initial neural network model are updated by back propagation of the error loss information, so that the error loss is converged. The back propagation algorithm is a back propagation motion that dominates the error loss, and aims to obtain parameters of the optimal neural network model, such as a weight matrix.

The method provided by the application is described below from the training side of the neural network and the application side of the neural network.

The training method of the neural network provided by the embodiment of the application relates to data processing, and can be particularly applied to data processing methods such as data training, machine learning, deep learning and the like, intelligent information modeling, extraction, preprocessing, training and the like of symbolizing and formalizing training data (such as sample pairs in the application) are performed, and finally a trained noise reduction classification model is obtained. It should be noted that, the model training method and the data noise reduction classification method provided by the embodiments of the present application are applications based on the same concept, and may be understood as two parts in a system or two phases of an overall process, such as a model training phase and a model application phase.

Sparse time series data is ubiquitous in everyday life. Sparse time series data is a set of non-dense data sequences arranged in chronological order. The sparse time sequence data is relatively dense, common dense data is an image, and common sparse time sequence data comprises human skeleton point coordinate data input by multiple frames in a somatosensory game, human electrocardiographic data, gesture data acquired by an IMU and the like. In practical applications, a large amount of useful information can be obtained by classifying and identifying the sparse time sequence data.

For example, human skeletal point coordinate data has very wide application in the field of somatosensory interaction. In a somatosensory game scene, virtual interaction of two hands is generally realized by acquiring coordinate data of skeletal points of a human body and identifying skeletal points of hands by adopting a somatosensory interaction device.

For another example, the electrocardiographic data of human body is one of important bases for diagnosing cardiovascular diseases by doctors, and with the popularization of artificial intelligence technology, the application of deep learning based on big data to realize automatic analysis diagnosis can break through the limitation of the accuracy and application range of the traditional statistical model. The automated analysis and diagnosis technology based on classification and identification of electrocardiographic data can read mass data of patients more deeply and realize accurate classification of patients.

For another example, IMUs have become very popular on wearable devices such as smart terminals such as cell phones, tablet computers, and virtual reality helmets. The IMU data collected by the IMU can be used as an important basis for identifying the motion state of the human body, for example, the identification of the motion state of a wearer can be realized based on the IMU data collected by a gyroscope on an intelligent bracelet or an intelligent watch.

All the application scenes are based on the fact that classification and identification can be performed on the time sequence data, and sparse time sequence data in a real environment can be interfered by various noises, so that classification accuracy of the sparse time sequence data is obviously reduced. For example, with respect to human skeletal point coordinate data, motion class prediction errors occur due to human part skeletal key point coordinates being dithered or missing as a result of a body being occluded. For electrocardiographic data, sources of interference are ubiquitous, whether in hospitals, ambulances, airplanes, ships, clinics, or at home. For IMU data, common system errors of the IMU comprise errors such as constant zero offset errors, scale factor errors, misalignment and non-orthogonal errors, nonlinear errors, temperature errors and the like after the IMU is started, and the errors can affect subsequent classification recognition tasks to different degrees. Therefore, the noise reduction of the time series data is a problem which cannot be ignored in various application scenes.

Taking human skeleton point coordinate data as an example, the action recognition method based on skeleton points takes human skeleton point coordinates as direct input, and has the characteristics of obvious semantic features of input data, high robustness to complex environments and the like due to small data quantity, so that the action recognition method based on skeleton points has wide application in the fields of man-machine interaction, intelligent monitoring, service robots and the like. Similar to noise in the image processing field, the skeletal point coordinate data in the user scene usually has noise problems such as lack of skeletal key points or jitter due to problems such as shielding or illumination. However, since the conventional human motion recognition method requires that input information is complete and free from defects, a phenomenon of human motion recognition errors is likely to occur when bone point coordinate data having the above-described noise problem is used as input.

Based on this, in the related art, after the sparse time series data is usually denoised, classification and recognition are performed on the denoised sparse time series data based on the original classification method. In order to realize the noise reduction of the sparse time sequence data, the existing data noise reduction method usually realizes the noise reduction of the sparse time sequence data by training a specific noise reduction network. For example, when the input is human skeletal point coordinate data, the smooth loss of maintaining visual rationality is used to guide the updating of network parameters in the training phase of the noise reduction network, and finally the network for realizing data noise reduction is obtained. Generally, the noise reduction network in the related art is generally trained based on the data without pose ambiguity, with time sequence smoothing, and/or with the aim of maintaining visual rationality. For an end-to-end noise reduction classification task, the final target is the classification precision of the data, so that a gap exists between a noise reduction network optimization target and the final classification precision target in the related technology, the classification recognition precision of the noise reduction sparse time sequence data in the related technology is lower, and the normal recognition of the sparse time sequence data is difficult to ensure.

In view of this, an embodiment of the present application provides a model training method and a related device, in the process of training a noise reduction classification network, a first loss function is obtained based on an output of noise-free data in a noise reduction network of the noise reduction classification network and an output of noise data in a middle layer of the noise reduction network, a second loss function is obtained based on an output of noise-free data and noise data in the entire noise reduction classification network, and training is performed on the noise reduction classification network based on the first loss function and the second loss function, so as to obtain a target network. The noise reduction classification network is trained, and the loss function is obtained based on the noise-free data and the output of the noise data in the noise reduction network and the output of the noise reduction classification network, so that the consistency of the noise reduction target and the classification precision target can be ensured, the network can learn more complete global characteristics in the noise reduction stage, the local noise and disturbance suppression capability of the network is enhanced, and the classification recognition precision of the network on the sparse time sequence data is improved.

The model training method provided by the embodiment of the application can be applied to a terminal, wherein the terminal is equipment capable of executing model training. After training to obtain a target network based on the model training method provided by the embodiment of the application, the terminal can perform noise reduction classification on the acquired time sequence data based on the target network. By way of example, the terminal may be, for example, a smart television, a personal computer (personal computer, PC), a notebook, a server, a mobile phone, a tablet, a mobile internet device (mobile INTERNET DEVICE, MID), a wearable device, a virtual reality (virtua l reality, VR) device, an augmented reality (augmented reality, AR) device, a wireless terminal in industrial control (industrial co ntrol), a wireless terminal in unmanned (SELF DRIVING), a wireless terminal in teleoperation (remote medicalsurgery), a wireless terminal in smart grid (SMART GRID), a wireless terminal in transportation security (transportatio n safety), a wireless terminal in smart city (SMART CITY), a wireless terminal in smart home (smart home), etc. The terminal may be a device running an android system, an IOS system, a windows system, and other systems.

For easy understanding, a specific application scenario provided by the embodiments of the present application will be described below with reference to the accompanying drawings. Referring to fig. 3b, fig. 3b is a schematic diagram of an application scenario provided in an embodiment of the present application. As shown in fig. 3b, one possible application scenario is that the smart tv performs classification recognition on the user's intention based on time-series data such as a human body pose, a head pose, an eye gaze direction, a facial expression, a gesture motion or voice, and performs a corresponding operation based on the user's intention obtained by the classification recognition, thereby completing interaction with the user.

Referring to fig. 3c, fig. 3c is a schematic diagram of an application of time series data according to an embodiment of the present application. Specifically, a data acquisition device such as a camera and a microphone can be installed on the intelligent television. Through the data acquisition device such as camera and microphone, the intelligent television can acquire the relevant data of the user who is located around the intelligent television. For example, the smart tv acquires time-series data such as hand key point data describing gesture intentions of a user, body pose data describing body motion intentions of a user, head pose data describing gaze direction intentions of a user, and/or face key point data describing expression intentions of a user through a camera. For another example, the smart tv obtains audio data describing the user's voice intent through a microphone. In addition, the intelligent television can be further provided with a communication device which can receive data sent by an external data acquisition device. For example, a bluetooth module is arranged on the smart television and can receive inertial measurement unit data sent by an external smart watch or smart bracelet. Based on the acquired time sequence data for describing the user intention, the intelligent television can conduct classification recognition on the user intention through the built-in classification model to obtain a classification prediction result of the user intention, and therefore corresponding interactive response operation is completed.

It can be appreciated that in practical application, the time sequence data acquired by the smart television may be one or more types of data, and the smart television can perform classification recognition on the user intention through the classification model based on the acquired one or more types of data. In short, for a classification model preset in the smart tv set, the input data of the classification model may be one or more types of data. For example, the input data of the classification model is the above-described two data, i.e., the head posture data describing the user's gaze direction intention and the hand key point data describing the user's gesture intention, and for example, the input data of the classification model is the audio data describing the user's voice intention. When the input data are multiple types of data, the classification model can perform classification recognition on the user intention based on the combination of the multiple types of data, and a classification prediction result of the user intention is obtained.

In an exemplary embodiment, when the input data is head gesture data describing the user's gaze direction intention and hand key point data describing the user's gesture intention, when the head gesture data is specifically indicated that the user's gaze direction is the direction in which the smart tv set is located and the hand key point data is specifically indicated as waving the hand to the right, the classification model can predict that the user's intention is to switch tv channels.

Referring to fig. 4, fig. 4 is a schematic flow chart of a model training method according to an embodiment of the present application. As shown in FIG. 4, the model training method includes the following steps 401-406.

In step 401, a sample pair is obtained, the sample pair comprising noise data and noise-free data corresponding to the noise data.

In this embodiment, the terminal may obtain a sample set including a plurality of sample pairs, where each sample pair in the sample set includes a pair of noise data and noise-free data. The noise-free data in the sample pair corresponds to the noise data, that is, the noise data in the sample pair is the noise-free data corresponding to the noise data after noise is removed, or the noise-free data in the sample pair is added with noise to obtain the noise data corresponding to the noise-free data. Alternatively, the noise data in the sample pair may be sparse time series data, for example, skeletal point coordinate data, electrocardiogram data, inertial measurement unit data, or fault diagnosis data. The fault diagnosis data may be, for example, device operation data or grid operation data, such as voltage, current, frequency or waveform data generated in real time during the grid operation.

For example, the noise data in the sample pair may be one or more types of data, such as the noise data being inertial measurement unit data only, or the noise data being head pose data and hand keypoint data. For convenience of description, the model training method according to the embodiment of the present application will be described below by taking noise data in a sample pair as one type of data.

In practical applications, after acquiring the noiseless data, the terminal may add random noise to the noiseless data to obtain corresponding noise data, thereby constructing and obtaining the sample pair. In the training process, the terminal can acquire sample pairs from the sample set successively so as to realize the training of the noise reduction classification network.

In a possible example, the terminal for executing the model training method in this embodiment may be, for example, a server, and the server executes the model training method corresponding to fig. 4 to obtain a trained model. The trained model can be deployed in the intelligent television before the intelligent television leaves the factory, or the intelligent television can be connected with a server through a network after the intelligent television leaves the factory, and the trained model on the server is obtained through downloading or updating, so that the trained model is deployed on the intelligent television.

Step 402, inputting the noiseless data into a noise reduction classification network to obtain first output data and second output data.

In this embodiment, the noise reduction classification network includes a noise reduction network and a classification network, the noise reduction network is connected to the classification network, the input of the noise reduction network is the input of the noise reduction classification network, and the output of the noise reduction network is the input of the classification network. The noise reduction network is used for carrying out noise reduction processing on the data input into the noise reduction classification network to obtain noise reduced data. After the noiseless data is input into the noise reduction classification network, first output data and second output data may be obtained. The first output data is output of the noise reduction network, the second output data is output of the classification network, namely the second output data is a classification result corresponding to the noise-free data output by the classification network.

Alternatively, the noise reduction network may be a self-encoder, which includes an encoder and a decoder. The self-encoder is also called an automatic encoder, is an artificial neural network, and can learn efficient representation of input data through unsupervised learning. In practical applications, the self-encoder may be enabled to implement noise reduction processing on noise data by adding constraints to the self-encoder. In the self-encoder, an encoder is used for compression encoding input data, and a decoder is used for data reconstruction of data output by the encoder. The key information of the input data is obtained by compression encoding the input data by the encoder in the self-encoder, then the compressed data is subjected to data reconstruction by the decoder in the self-encoder, so that the reduction of the input data is realized based on the key information of the input data, and the reduced data realizes the elimination of noise, namely the noise reduction processing of the data can be realized based on the self-encoder. Illustratively, the encoder and decoder may be a recurrent neural network (Recurrent Neural Network, RNN) including multiple convolutional layers.

The first output data may be output data obtained by inputting the noise-free data into the noise-reduction classification network and processing the noise-free data by an encoder in the noise-reduction network. In general, data processed by an encoder in a noise reduction network is also referred to as content vectors. Content vectors refer to the output from an encoder in an encoder, typically a set of high latitude feature vectors.

The classification network is used for acquiring the data processed by the noise reduction network and outputting a group of probability value vectors corresponding to the data input into the classification network, wherein each element value in the vectors is the probability of the corresponding class of the input data. Generally, the category with the highest probability is the category to which the input data belongs. That is, the classification network is used for classifying the acquired data to obtain the category corresponding to the data. Illustratively, when the data input to the noise reduction classification network is IMU data collected by a smart bracelet, the classification network is used to classify IMU data in four categories of walking, running, riding, and climbing stairs. For example, when the probability label output by the classification network is {0.1,0.7,0.15,0.05}, it may be determined that the category with the probability of 0.7 (i.e., running) is the category to which the IMU data belongs.

The second output data may be data output by the classification network after the noise-free data is input into the noise-reduction classification network and processed by the noise-reduction network and the classification network.

Step 403, inputting the noise data into the noise reduction classification network to obtain third output data and fourth output data, wherein the third output data is obtained based on the middle layer of the noise reduction network, and the fourth output data is the output of the classification network.

In this embodiment, after inputting the noise data corresponding to the above noise-free data into the noise reduction classification network, the terminal may obtain third output data extracted from an intermediate layer of the noise reduction network in the noise reduction classification network, where the third output data is feature data extracted from the intermediate layer of the noise reduction network. In addition, the terminal can also acquire fourth output data output by the classification network in the noise reduction classification network, wherein the fourth output data is a classification result which is output by the classification network and corresponds to the noise data.

Alternatively, the third output data may be derived based on an intermediate layer of an encoder in the noise reduction network. For example, one or more intermediate layers may be included in the encoder, and the terminal may acquire the characteristic data output from each intermediate layer in the encoder and use the characteristic data as the third output data. For example, the encoder is a recurrent neural network (Recurrent Neural Network, RNN) comprising three convolutional layers, the middle layer of the encoder is the two latter convolutional layers in the encoder, and the data output by the two convolutional layers is the third output data.

It should be understood that, the steps 402 and 403 are not limited in execution sequence, and the step 402 may be executed first or the step 403 may be executed first in practical applications. The execution sequence of steps 402 and 403 is not specifically limited in this embodiment.

Step 404, determining a first loss function according to the first output data and the third output data, wherein the first loss function is used for representing the difference between the first output data and the third output data.

Since the first output data and the third output data are both obtained based on the noise reduction network, after the first output data and the third output data are obtained, a first loss function may be determined based on the first output data and the third output data to characterize a difference between the first output data and the third output data. In this way, by obtaining the first loss function of the noiseless data and the first loss function of the noiseless data between the outputs of the noise reduction network and training the noise reduction classification network based on the first loss function, the noise reduction classification network can learn global features which are closer to the noiseless data in the noise reduction processing stage, so that the capacity of the noise reduction network for inhibiting local noise and disturbance is enhanced.

Optionally, in the case where the third output data is derived based on an intermediate layer of an encoder in the noise reduction network, the third output data comprises data output by one or more intermediate layers of the encoder. In this case, it is possible to find the difference value between the data output by each intermediate layer and the first output data, and obtain the first loss function by finding the sum of the difference values between the data output by the plurality of intermediate layers and the first output data.

Optionally, in one possible embodiment, in step 403, after inputting the noise data into the noise reduction classification network and obtaining the feature data output by the middle layer of the noise reduction network, the terminal may divide the obtained feature data into a plurality of sub-feature data to obtain third output data, where the third output data includes the plurality of sub-feature data.

For example, in the case where the noise data is sparse time sequence data, the terminal may divide the feature data into a plurality of sub-feature data uniformly in time sequence, to obtain the third output data, where the length of a time period corresponding to each of the plurality of sub-feature data is the same. For example, when the feature data output by the middle layer of the noise reduction network is data in a T0-T3 time period, the terminal may divide the feature data according to a time sequence to obtain first sub-feature data in the T0-T1 time period, second sub-feature data in the T1-T2 time period, and third sub-feature data in the T2-T3 time period, where the lengths of the time periods corresponding to the first sub-feature data, the second sub-feature data, and the third sub-feature data are the same.

Then, after dividing into a plurality of sub-feature data, the terminal finds a difference value between each of the sub-feature data in the third output data and the first output data, and determines a first loss function based on a sum of the difference values between the plurality of sub-feature data and the first output data. In this way, the feature data corresponding to the noise data output by different middle layers in the noise reduction network are extracted, the time sequence local features are evenly divided, the time sequence global features are constructed based on the feature data corresponding to the noise-free data output by the noise reduction network, and the loss function is constructed based on the time sequence local features and the time sequence global features, so that the noise reduction network can be guided to learn the global time sequence features more fully, and the capability of the noise reduction network for inhibiting local noise and disturbance is enhanced.

For the time sequence data, the time sequence data are coherent, adjacent time sequence data have certain relevance, a loss function is constructed based on local time sequence characteristics corresponding to noise data and global time sequence characteristics corresponding to noise-free data, and the noise reduction network can be guided to learn richer global time sequence information, so that the capability of the noise reduction network for inhibiting local noise and disturbance is enhanced. For example, when the noise data is time sequence data in which the data is missing for a short period of time, the loss function is constructed and the noise reduction network is trained based on the mode, so that the noise reduction network can restore the missing data based on the learned global time sequence information, and noise reduction of the noise data can be achieved with good noise reduction effect.

Optionally, in one possible embodiment, since the first output data is data output by an encoder in the noise reduction network and the third output data is data output by an intermediate layer of the encoder of the noise reduction network, the dimensions of the two are not the same. Therefore, before the first loss function is obtained, the first output data and the third output data may be subjected to a dimension alignment operation so that the dimensions of the first output data and the third output data are the same, and then the difference value between the first output data and the third output data is obtained.

Illustratively, the terminal performs a dimension alignment operation on each piece of sub-feature data in the first output data and the third output data, respectively, to obtain the first output data and the third output data with aligned dimensions. Specifically, the terminal may construct a plurality of dimension alignment sub-networks in advance, and input each piece of sub-feature data in the third output data into other corresponding dimension alignment sub-networks by inputting the first output data into one of the dimension alignment sub-networks, so as to obtain the first output data and the third output data with aligned dimensions. The dimension alignment sub-network takes a gating circulation unit (gated recurrent unit, GRU) as a basic unit, and is formed by multiple layers of GRUs. The dimension alignment sub-network can change the dimension of the input data. After the third output data is aligned with the dimensions of the first output data, the terminal determines a difference value between each piece of sub-feature data in the third output data aligned with the dimensions and the first output data aligned with the dimensions.

Illustratively, the process of determining the first loss function based on the first output data and the third output data may be as shown in equation 1 below.

Wherein, Representing a first loss function, l representing a first layer middle layer of the encoder, i representing an i-th temporal local feature, log () representing a logarithmic function, exp () representing an exponential function; The method comprises the steps of representing an ith time sequence local feature obtained after uniformly dividing the feature output by the middle layer of the first layer of the encoder, wherein ρ ^l represents a dimension alignment sub-network corresponding to the time sequence local feature of the middle layer of the first layer of the encoder; The dimension alignment sub-network corresponding to the content vector output by the encoder is represented, namely the dimension alignment sub-network corresponding to the first output data.

Specifically, a dimension-aligned subnetwork employing a GRU as a base unit may be formally represented as a GRU (x, num_layers, out_dim), where num_layers are the number of network layers and out_dim represents the desired output dimension.

Step 405, determining a second loss function according to the second output data and the fourth output data, wherein the second loss function is used for representing the difference between the second output data and the real class labels of the fourth output data and the noiseless data.

The second output data is the output of the noise reduction classification network after the noise-free data is input into the noise reduction classification network, namely the class result corresponding to the noise-free data predicted by the noise reduction classification network. The fourth output data is the output of the noise reduction classification network after the noise data is input into the noise reduction classification network, namely the class result corresponding to the noise data predicted by the noise reduction classification network. In fact, the noise-free data and the real class label corresponding to the noise data are identical, and therefore, the terminal can determine the second loss function by calculating the first difference value of the real class label of the second output data and the noise-free data and the second difference value of the real class label of the fourth output data and the noise-free data. For example, the second loss function is obtained by summing the first difference value of the real class label of the second output data and the noiseless data and the second difference value of the real class label of the fourth output data and the noiseless data.

Taking noiseless data as IMU data as an example, when the category corresponding to the IMU data is running, the real category label corresponding to the IMU data may be {0,1, 0}. The four element values in the real class label respectively represent four classes of walking, running, riding and climbing stairs. It is assumed that after the noiseless data is input to the noise reduction classification network, the resulting second output data is {0.1,0.7,0.15,0.05}. Then, the process of obtaining the first difference value between the second output data and the real class label of the noiseless data is to obtain the difference value between the vector {0.1,0.7,0.15,0.05} and the vector {0,1, 0}.

Illustratively, the process of determining the second loss function based on the second output data and the fourth output data may be as shown in equation 2 below.

L ₂＝L_mul(X_normal)+L_mul(X_noise) equation 2

Where L ₂ represents the second loss function, L _mul(X_normal) represents the cross-entropy loss between the second output data and the real class label of the noiseless data, and L _mul(X_noise) represents the cross-entropy loss between the fourth output data and the real class label of the noiseless data.

In the case of the formula 2 of the present invention,

Where p (X _noise) represents the probability that the predicted class of input sample X _noise is y _noise,i.

In the case of the formula 2 of the present invention,

Where p (X _normal) represents the probability that the predicted class of input sample X _normal is y _normal,i.

In practical applications, the difference value between the output data and the real class label may be represented by other difference measurement methods besides the cross entropy loss, which is not particularly limited in this embodiment.

And step 406, training the noise reduction classification network at least according to the first loss function and the second loss function until a preset training condition is met, and obtaining a target network.

After obtaining the first loss function and the second loss function, the terminal may calculate a total loss function based on the first loss function and the second loss function, where the total loss function may be a sum of the first loss function and the second loss function, and the total loss function may be obtained by adding a product of the first loss function and the first scaling factor to a product of the second loss function and the second scaling factor. After the total loss function is obtained, the terminal trains the noise reduction classification network based on the total loss function. The process of training the noise reduction classification network by the terminal based on the total loss function comprises the steps that the terminal adjusts parameters in the noise reduction classification network (comprising the noise reduction network and the classification network) based on the value of the total loss function, and repeatedly executes steps 401-406, so that the parameters in the noise reduction classification network are continuously adjusted until the obtained total loss function is smaller than a preset threshold value, and the fact that preset training conditions are met can be determined, and the target network is obtained. The target network is a trained noise reduction and classification network and can be used for subsequent data noise reduction and classification.

Optionally, in the training process of the noise reduction classification network, the terminal may update parameters of the noise reduction classification network through an error back propagation algorithm. In short, the terminal can correct the parameter in the initial noise reduction classification network in the training process of the noise reduction classification network through an error back propagation algorithm, so that the reconstruction error loss of the noise reduction classification network is smaller and smaller. Specifically, the input signal is forwarded until the output generates error loss, and the parameters in the initial noise reduction classification network are updated through back propagation of error loss information, so that the error loss is converged. The back propagation algorithm is a back propagation motion that dominates the error loss, and aims to obtain parameters of the optimal neural network model, such as a weight matrix.

Optionally, in one possible embodiment, during the training process of the noise reduction classification network, a binary classifier may be further introduced, where the binary classifier can perform a classification prediction on the data input into the noise reduction classification network based on the features extracted by the noise reduction classification network, that is, predict whether the data input into the noise reduction classification network is noise data or noise-free data. In short, the two classification results output by the binary classifier are a probability value vector, and the vector includes two element values, which respectively represent the probability that the input data belongs to the noise-free data class and the noise data class. And then, determining a third loss function based on the classification result output by the binary classifier and the real classification label corresponding to the input data, wherein the third loss function is used for solving the total loss function together with the first loss function and the second loss function, namely the third loss function is also used for training the noise reduction classification network.

Specifically, the terminal acquires a first feature, predicts a classification result corresponding to the noiseless data according to the first feature, and obtains a first prediction result. Wherein the first feature is extracted by a classification network in the noise reduction classification network after the noise-free data is input into the noise reduction classification network. For example, after the noiseless data is input into the noise reduction classification network, the classification network in the noise reduction classification network performs feature extraction on the input data, and performs multi-class prediction based on the extracted features to realize classification of the noiseless data. Then, the terminal may obtain the first feature by obtaining the feature extracted by the classification network. And then, the terminal inputs the acquired first characteristic into a binary classifier to predict and obtain a classification result corresponding to the noiseless data.

And the terminal acquires the second characteristic, predicts a classification result corresponding to the noise data according to the second characteristic, and obtains a second prediction result. Wherein the second feature is extracted by a classification network in the noise reduction classification network after the noise data is input into the noise reduction classification network. Similarly, after noise data is input into the noise reduction classification network, the classification network in the noise reduction classification network performs feature extraction on the input data, and performs multi-category prediction based on the extracted features. The terminal can obtain the second feature by obtaining the feature extracted by the classification network. And then, the terminal inputs the acquired second characteristic into a binary classifier to predict and obtain a classification result corresponding to the noise data.

After the first prediction result and the second prediction result are obtained, the terminal determines a third difference value between the first prediction result and the real two-class label of the noiseless data and a fourth difference value between the second prediction result and the real two-class label of the noise data, and determines a third loss function according to the third difference value and the fourth difference value.

Finally, the terminal trains the noise reduction classification network according to at least the first, second and third loss functions. The total loss function may be a sum of the first loss function, the second loss function, and the third loss function, and the total loss function may also be a product of the first loss function and the first scaling factor, a product of the second loss function and the second scaling factor, and a sum of the third loss function and the third scaling factor. In practical application, the first scaling factor, the second scaling factor and the third scaling factor may be adjusted according to the accuracy requirement of noise reduction classification, which is not limited herein.

Illustratively, the process of determining the third loss function based on the first prediction result and the second prediction result may be as shown in equation 3 below.

L ₃＝L_bin(X_normal)+L_bin(X_noise) equation 3

Where L ₃ represents a third loss function, L _bin(X_normal) represents a cross entropy loss between the first predictor and the true bi-class label of the noise-free data, and L _bin(X_noise) represents a cross entropy loss between the second predictor and the true bi-class label of the noise data.

In the case of the formula 3 of the present invention,

Where p (X _normal) represents the probability that the predicted tag for input X _normal is b _i, and N is the number of batch samples.

In the case of the formula 3 of the present invention,

Where p (X _noise) represents the probability that the predicted tag for input X _noise is b _i, and N is the number of batch samples.

In this embodiment, a binary classifier is introduced in a training stage, and based on features extracted by a classification network in a noise reduction classification network, a binary classification result of input data is predicted by the binary classifier, so as to obtain a loss function corresponding to the binary classification result. By introducing the loss function corresponding to the classification result based on the original loss function, an additional evaluation dimension can be introduced, so that the trained noise reduction classification network can have adaptive noise reduction classification scales for different types of input data, and the classification precision of the noise reduction classification network is improved. For example, for noiseless data and noise data, the noise reduction classification network can learn different noise reduction classification scales, so that the trained noise reduction classification network can have higher noise reduction classification accuracy no matter whether the input data is noiseless data or noise data.

For ease of understanding, the model training method provided by the embodiment of the present application will be described below with reference to specific examples.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a noise reduction classification network according to an embodiment of the present application. As shown in fig. 5, a training set 501 and a test set 502 are stored on a server 500, and a noise reduction classification network 503 is also disposed on the server 500. Wherein the training set 501 comprises a plurality of sample pairs for training the noise reduction classification network 503, and the test set 502 also comprises a plurality of sample pairs for verifying the performance of the noise reduction classification network obtained by training.

Noise reduction classification network 503 includes a noise reduction network 5031 and a classification network 5032. The noise reduction network 5031 includes an encoder 50311 and a decoder 50312, the encoder 50311 is used for compression encoding input data, and the decoder 50312 is used for data reconstruction of data output by the encoder 50311. Noise reduction processing of input data can be realized based on the encoder 50311 and the decoder 50312. The classification network 5032 includes a feature extraction module 50321 and a classifier 50322, the feature extraction module 50321 is configured to perform feature extraction on the data output by the decoder 50312, and the classifier 50322 is configured to perform multi-class prediction based on the features extracted by the feature extraction module 50321, so as to obtain a prediction result, that is, input a prediction class corresponding to the noise reduction classification network 503.

In addition, to enable training of the noise reduction classification network 503, a local-global feature association module 504 and a hybrid classifier 505 are also deployed on the server 500. The local-global feature correlation module 504 is configured to determine a local-global feature correlation loss, i.e. the first loss function, based on the output of the noise-free data at the noise reduction network 5031 and the output of the noise data at the noise reduction network 5031.

Specifically, referring to fig. 6, fig. 6 is a schematic structural diagram of a local-global feature association module and a hybrid classifier according to an embodiment of the present application. As shown in fig. 6, the local-global feature association module 504 includes a feature division module 5041 and a dimension alignment sub-network 5042, where the feature division module 5041 is configured to obtain output data (i.e., feature data) corresponding to the middle layer of the encoder 50311 of noise data, and uniformly divide the obtained feature data according to a time sequence, so as to obtain sub-feature data. The dimension alignment sub-network 5042 is used for dimension alignment of the sub-feature data obtained by uniform division and the output data of the noise-free data in the encoder 50311. In this way, the local-global feature correlation module 504 may calculate a local-global feature correlation penalty, i.e., the first penalty function described above, based on the dimension-aligned data determinations.

The hybrid classifier 505 is configured to determine the multi-class classification loss, i.e. the second loss function, based on the output of the classifier 50322 and the real class label corresponding to the original input data of the noise reduction classification network 503. The hybrid classifier 505 is further configured to calculate a two-class loss, i.e. the third loss function described above, according to the output of the feature classification module 5041 and the real two-class label corresponding to the original input data of the noise reduction classification network 503. As shown in fig. 6, the hybrid classifier 505 includes a binary classifier 5051 for solving for a classification loss and a multi-class classifier 5052 for solving for a multi-class classification loss, the binary classifier 5051. Illustratively, the binary classifier 5051 may be composed of two fully connected layers plus a softmax layer, and the multi-class classifier 5052 may be a human motion recognition (Human Action Recognition, HAR) classifier composed of a space-time diagram neural network.

In the training process, the server 500 updates parameters in the noise reduction classification network 503 through an error back propagation algorithm based on the first loss function obtained by the local-global feature correlation module 504 and the second and third loss functions obtained by the hybrid classifier 505 until a preset training condition is satisfied, thereby obtaining a target network.

The process of training the noise reduction classification network will be described in detail below based on the noise reduction classification network shown in fig. 5. Referring to fig. 7, fig. 7 is a schematic flow chart of training a noise reduction classification network according to an embodiment of the present application. As shown in fig. 7, in an experiment, the human skeleton key point coordinate data is input into a noise reduction classification network 503, corresponding local global feature loss and mixed classification loss are obtained based on a local global feature association module 504 and a mixed classifier 505, and a total loss function is obtained according to the local global feature loss and the mixed classification loss. Then, based on the calculated total loss function, the network parameter updating module 802 updates parameters of the noise reduction network 5031 and the classification network 5032 in the noise reduction classification network 503 by using an error back propagation algorithm, so as to realize training of the noise reduction classification network 503.

Specifically, a flow of training the noise reduction classification network will be described in detail below.

1. Before training begins, a training set and a test set are prepared.

Before starting training the noise reduction classification network, the data sets, i.e. the training set and the test set, need to be prepared on the server. The training set and the testing set comprise a plurality of sample pairs, each sample pair comprises noise-free data and noise data, and the noise-free data and the noise data in the sample pairs are time sequence data. In practical applications, the data set may be prepared according to the scenario in which the noise reduction classification network is applied. For example, when the noise reduction classification network is used for classifying and recognizing human actions, a training set and a test set composed of skeletal point coordinate data are prepared.

In particular, the process of constructing the training set and the test set may be as shown in fig. 8. Fig. 8 is a schematic diagram of generating noise data according to an embodiment of the present application. The terminal may be configured to, after acquiring the noiseless data, add random noise, for example, noise of different degrees, to the noiseless data to obtain corresponding noise data, thereby obtaining the pair of samples. After the plurality of sample pairs are constructed, a portion of the sample pairs may be partitioned into training sets and another portion of the sample pairs may be partitioned into test sets. The ratio of the pairs of samples included in the training set and the test set may be, for example, 4:1.

2. Constructing local-global feature association loss.

In the training stage, the server acquires the characteristic data output by the middle layer of the encoder when the noise data in the sample pair is input, and uniformly divides the acquired characteristic data to obtain the time sequence local characteristic. The server also obtains the content vector output by the encoder when the noiseless data in the sample pair is input, i.e., the instant global feature. And then, the server performs dimension alignment on the time sequence local features obtained by division and content vectors corresponding to the noiseless data through a feature dimension alignment sub-network, and constructs local-global feature association loss, namely the first loss function. The server may construct the local-global feature association loss based on the above formula 1, and specifically reference may be made to the above formula 1.

For example, referring to fig. 9a, fig. 9a is a schematic flow chart of constructing local-global feature association loss according to an embodiment of the present application. As shown in fig. 9a, X _normal represents noise-free data in a sample pair, X _noise represents noise data in a sample pair, E1 represents an encoder, g represents a content vector output by the encoder, and D1 represents a decoder. After inputting the noiseless data, the server acquires the content vector g output by the encoder. After inputting the noise data, the server acquires the feature data (i.e., the intermediate layer timing feature before division in fig. 9 a) output by the intermediate layer of the encoder, and equally divides the acquired feature data to obtain sub-feature data (i.e., the intermediate layer timing feature after division in fig. 9 a). Then, the server performs dimension alignment on the divided middle layer timing characteristics and the content vector g based on the dimension alignment sub-network, and constructs a local-global characteristic association loss.

3. Constructing a hybrid classification penalty.

The hybrid classifier constructed by the server includes a multi-class classifier and a binary classifier. For noiseless data and noise data in the sample pair, the hybrid classifier may calculate the corresponding hybrid losses for noiseless data X _normal and noise data X _noise, i.e., the multi-class loss and the two-class loss, respectively. Specifically, the process of the server calculating the mixed classification loss corresponding to the noise-free data X _normal and the noise data X _noise may be as shown in the following equations 4 and 5.

L _hc1＝L_bin(X_normal)+aL_mul(X_normal) equation 4

L _hc2＝L_bin(X_noise)+αL_mul(X_noise) equation 5

Wherein L _hc1 is a mixed classification loss corresponding to the noiseless data X _normal, L _hc2 is a mixed classification loss corresponding to the noiseless data X _noise, L _bin(X_normal) represents a cross entropy loss between the first prediction result and a real two-class label of the noiseless data, alpha is a proportionality coefficient, L _bin(X_noise) represents a cross entropy loss between the second prediction result and a real two-class label of the noiseless data, L _mul(X_normal) represents a cross entropy loss between the second output data and a real class label of the noiseless data, and L _mul(X_noise) represents a cross entropy loss between the fourth output data and a real class label of the noiseless data.

Referring to fig. 9b, fig. 9b is a schematic flow chart of constructing local-global feature association loss and hybrid classification loss according to an embodiment of the present application. X _normal denotes noise-free data in the sample pair, X _noise denotes noise data in the sample pair, E1 denotes an encoder, g denotes a content vector output by the encoder, and D1 denotes a decoder. After inputting the noiseless data, the server acquires the content vector g output by the encoder. After inputting the noise data, the server acquires the characteristic data output by the middle layer of the encoder, and equally divides the acquired characteristic data to obtain sub-characteristic dataThen, the server performs dimension alignment on the divided middle layer timing characteristics and the content vector g based on the dimension alignment sub-network, and constructs a local-global characteristic association loss L _ln. In addition, after the noise reduction network performs noise reduction processing on the noiseless data and the noise data, a feature extraction module in the classification network continues to perform feature extraction processing on the data output by the noise reduction network, and the extracted feature data is respectively input into a multi-class classifier (binary CLASSIFIER C1) and a binary classifier (HAR CLASSIFIER C) to obtain a noiseless data mixed classification loss L _hc1 and a mixed classification loss L _hc2 corresponding to the noise data.

4. And constructing a total loss function of the whole noise reduction classification network, and finishing parameter updating of the noise reduction classification network.

After obtaining the local-global feature association loss and the mixed classification loss based on the third step and the fourth step, the server completes parameter updating of the noise reduction classification network by using an error back propagation algorithm according to the proportion between the local-global feature association loss and the mixed classification loss and as a total loss function of the whole noise reduction classification network. Wherein the total loss function of the entire noise reduction classification network may be as shown in equation 6 below.

L _total＝L_ln+λ(L_hc1+L_hc2) equation 6

Wherein L _total is the total loss function of the noise reduction classification network, L _ln is the local-global feature association loss, lambda is the proportionality coefficient, L _hc1 is the mixed classification loss corresponding to the noiseless data X _normal, and L _hc2 is the mixed classification loss corresponding to the noise data X _noise.

After the noise reduction classification network is trained, the target network obtained through training can be deployed on a server or a terminal such as a smart phone, so that the noise reduction classification application of the time sequence data is realized.

Referring to fig. 10, fig. 10 is a schematic diagram comparing a conventional scheme provided by an embodiment of the present application with a scheme of the present application. As shown in fig. 10, fig. 10 is a comparison diagram of training situations of a classification network in the prior art scheme and a noise reduction classification network trained by the method of the present application. The abscissa in fig. 10 represents the iteration number of the training phase, the unit is 103, and the ordinate represents the comparison of the loss function curves of the classification network in the existing scheme of the training phase and the noise reduction classification network trained by the method of the application on the verification set. As can be seen from fig. 10, the noise reduction classification network trained by the method of the present application has a verification set loss lower than the classification network in the existing scheme when the iteration number is 20000. When the iteration number is greater than 40000, the noise reduction classification network trained by the method starts to converge.

Referring to table 1, table 1 is a comparison of classification accuracy of a human motion classification model ST-GCN in the existing scheme and a noise reduction classification network trained by using the model training method provided by the embodiment of the application.

TABLE 1

	Noise level=0	Noise level=1	Noise level=3	Noise level=5
					Existing solutions	81.57%	73.78%	57.76%	42.73%
The scheme of the application	84.49%	84.11%	83.28%	82.20%

In table 1, the noise level n (n= 0,1,3,5) indicates that n skeletal joints are randomly selected and random spatial translational noise is added at each frame of skeletal points in the normal human skeletal point coordinate data. Wherein, the space coordinates of the bone points after adding noise are limited in the minimum bounding box determined by the space coordinates of the normal bone points. When the noise level=0, the noise reduction classification network trained by the model training method provided by the embodiment of the application can play a role in data enhancement, and can improve the model classification precision. As the noise level increases, the gap between the noise reduction classification network of the inventive scheme and the classification network in the existing scheme also increases gradually.

The embodiment of the application also provides a noise reduction classification method, which comprises the steps of obtaining data to be classified, inputting the data to be classified into a target network to obtain a prediction result, wherein the prediction result is the classification result of the data to be classified. The data to be classified includes sparse time series data including skeletal point coordinate data, electrocardiogram data, inertial measurement unit data or fault diagnosis data. The target network is used for performing noise reduction processing and classification on the data to be classified, and is obtained by training based on the model training method described in the above embodiment, and specifically, reference may be made to the description of the above embodiment, which is not repeated herein. Optionally, the target network may be deployed on an intelligent television, and is configured to perform noise reduction classification on data acquired by the intelligent television, so as to obtain a classification prediction result, where the classification prediction result is specifically used to represent a user intention. In this way, the smart tv can perform an interactive response operation related to the user's intention, for example, a channel switching operation, based on the obtained classification prediction result.

Having described the model training method and the noise reduction classification method provided by the embodiments of the present application, an apparatus for performing the method mentioned in the above embodiments will be described below.

Referring to fig. 11, fig. 11 is a schematic structural diagram of a model training device according to an embodiment of the present application. As shown in fig. 11, the model training apparatus includes an acquisition unit 1101 and a processing unit 1102. The system comprises an acquisition unit 1101, a processing unit 1102, a training function and a training function, wherein the acquisition unit 1101 is used for acquiring a sample pair, the sample pair comprises noise data and noise-free data corresponding to the noise data, the processing unit 1102 is used for inputting the noise-free data into a noise-reduction classification network to obtain first output data and second output data, the noise-reduction classification network comprises a noise-reduction network and a classification network, the first output data is the output of the noise-reduction network, the second output data is the output of the classification network, the processing unit 1102 is also used for inputting the noise data into the noise-reduction classification network to obtain third output data and fourth output data, the third output data is obtained based on an intermediate layer of the noise-reduction network, the fourth output data is the output of the classification network, the processing unit 1102 is also used for determining a first loss function according to the first output data and the third output data, the first loss function is used for representing the difference between the first output data and the third output data, the processing unit 1102 is also used for determining a training function according to the second output data and the fourth output data, the second loss function is used for obtaining the training function is used for obtaining the difference between the first loss function and the second output data and the noise-reduction function, and the training function is not used for obtaining the training function.

Optionally, the processing unit 1102 is further configured to input the noise data into the noise reduction classification network to obtain feature data output by an intermediate layer of the noise reduction network, divide the feature data into a plurality of sub-feature data to obtain the third output data, where the third output data includes the plurality of sub-feature data, determine a difference value between each sub-feature data in the third output data and the first output data, and determine the first loss function according to the difference value between each sub-feature data and the first output data.

Optionally, in a possible implementation manner, the processing unit 1102 is further configured to divide the feature data into a plurality of sub-feature data uniformly according to a time sequence, so as to obtain the third output data, where a time period corresponding to each of the plurality of sub-feature data is the same in length, and the noise data is time sequence data.

Optionally, in a possible implementation manner, the processing unit 1102 is further configured to perform a dimension alignment operation on each piece of sub-feature data in the first output data and the third output data, so as to obtain first output data and third output data with aligned dimensions. A difference value between each sub-feature data in the third dimension-aligned output data and the first dimension-aligned output data is determined.

Optionally, in a possible implementation manner, the processing unit 1102 is further configured to determine a difference between the second output data and a real class label of the noiseless data to obtain a first difference value, determine a difference between the fourth output data and the real class label of the noiseless data to obtain a second difference value, and obtain the second loss function according to the first difference value and the second difference value, where the second output data is a multi-classification prediction result and is used to represent a result of the classification network prediction.

Optionally, in one possible implementation manner, the obtaining unit 1101 is further configured to obtain a first feature, and predict a classification result corresponding to the noise-free data according to the first feature, to obtain a first prediction result, where the first feature is extracted by a classification network in the noise-free classification network after the noise-free data is input into the noise-reduction classification network, the obtaining unit 1101 is further configured to obtain a second feature, and predict a classification result corresponding to the noise data according to the second feature, to obtain a second prediction result, where the second feature is extracted by a classification network in the noise-reduction classification network after the noise data is input into the noise-reduction classification network, and the processing unit 1102 is further configured to determine a third loss function according to the first prediction result and a real two-classification label of the noise-free data, the second prediction result and a real two-classification label of the noise data, and further configured to determine a training function according to at least the first loss function, the second loss function and the third loss function, where the noise-free data is of a corresponding type or a noise-free type.

Referring to fig. 12, fig. 12 is a schematic structural diagram of a noise reduction classification device according to an embodiment of the application. The embodiment of the application provides a noise reduction classification device, which comprises an acquisition unit 1201 and a processing unit 1202. The acquiring unit 1201 is configured to acquire data to be classified. The processing unit 1202 is configured to input the data to be classified into a target network to obtain a prediction result, where the prediction result is a classification result of the data to be classified, and the target network is configured to perform noise reduction processing and classification on the data to be classified, and the target network is trained based on the model training method described in the above embodiment.

Referring to fig. 13, fig. 13 is a schematic structural diagram of an execution device provided in an embodiment of the present application, and the execution device 1300 may be embodied as a mobile phone, a tablet, a notebook computer, an intelligent wearable device, a server, etc., which is not limited herein. The execution device 1300 may be deployed with the data processing apparatus described in the corresponding embodiment of fig. 13, to implement the functions of data processing in the corresponding embodiment of fig. 13. Specifically, the execution device 1300 includes a receiver 1301, a transmitter 1302, a processor 1303, and a memory 1304 (where the number of processors 1303 in the execution device 1300 may be one or more, and one processor is illustrated in fig. 13 as an example), where the processor 1303 may include an application processor 13031 and a communication processor 13032. In some embodiments of the application, the receiver 1301, transmitter 1302, processor 1303, and memory 1304 may be connected by a bus or other means.

Memory 1304 may include read only memory and random access memory and provides instructions and data to processor 1303. A portion of the memory 1304 may also include non-volatile random access memory (non-volatile random access memory, NVRAM). The memory 1304 stores a processor and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operating instructions may include various operating instructions for performing various operations.

The processor 1303 controls operations of the execution device. In a specific application, the individual components of the execution device are coupled together by a bus system, which may include, in addition to a data bus, a power bus, a control bus, a status signal bus, etc. For clarity of illustration, however, the various buses are referred to in the figures as bus systems.

The method disclosed in the above embodiment of the present application may be applied to the processor 1303 or implemented by the processor 1303. The processor 1303 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the method described above may be performed by integrated logic circuitry in hardware or instructions in software in the processor 1303. The processor 1303 may be a general purpose processor, a Digital Signal Processor (DSP), a microprocessor, or a microcontroller, and may further include an Application SPECIFIC INTEGRATED Circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The processor 1303 may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 1304, and the processor 1303 reads information in the memory 1304, and performs the steps of the method in combination with hardware.

The receiver 1301 may be used to receive input numeric or character information and to generate signal inputs related to performing relevant settings and function control of the device. The transmitter 1302 may be used to output digital or character information via the first interface, the transmitter 1302 may be further used to send instructions to the disk pack via the first interface to modify data in the disk pack, and the transmitter 1302 may further include a display device such as a display screen.

In an embodiment of the present application, in an instance, the processor 1303 is configured to execute a training method of the noise reduction model executed by the execution device in the corresponding embodiment of fig. 4.

Referring to fig. 14, fig. 14 is a schematic structural diagram of a training apparatus according to an embodiment of the present application, specifically, training apparatus 1400 is implemented by one or more servers, and training apparatus 1400 may have relatively large differences according to different configurations or performances, and may include one or more central processing units (central processing units, CPU) 1414 (e.g., one or more processors) and a memory 1432, and one or more storage mediums 1430 (e.g., one or more mass storage devices) storing application programs 1442 or data 1444. Wherein the memory 1432 and storage medium 1430 can be transitory or persistent storage. The program stored on the storage medium 1430 may include one or more modules (not shown) each of which may include a series of instruction operations for the training device. Still further, central processor 1414 may be configured to communicate with storage medium 1430 to execute a series of instruction operations in storage medium 1430 on training device 1400.

The training apparatus 1400 may also comprise one or more power supplies 1426, one or more wired or wireless network interfaces 1450, one or more input/output interfaces 1458, or one or more operating systems 1441, such as Windows Server ^TM,Mac OS X^TM,Unix^TM,Linux^TM,FreeBSD^TM, or the like.

Embodiments of the present application also provide a computer program product which, when run on a computer, causes the computer to perform the steps as performed by the aforementioned performing device or causes the computer to perform the steps as performed by the aforementioned training device.

The embodiment of the present application also provides a computer-readable storage medium having stored therein a program for performing signal processing, which when run on a computer, causes the computer to perform the steps performed by the aforementioned performing device or causes the computer to perform the steps performed by the aforementioned training device.

The execution device, the training device or the terminal device provided by the embodiment of the application can be a chip, wherein the chip comprises a processing unit and a communication unit, the processing unit can be a processor, and the communication unit can be an input/output interface, a pin or a circuit, for example. The processing unit may execute the computer-executable instructions stored in the storage unit to cause the chip in the execution device to perform the data processing method described in the above embodiment, or to cause the chip in the training device to perform the data processing method described in the above embodiment. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, or the like, and the storage unit may also be a storage unit in the wireless access device side located outside the chip, such as a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a random access memory (random access memory, RAM), or the like.

Specifically, referring to fig. 15, fig. 15 is a schematic structural diagram of a chip provided in an embodiment of the present application, where the chip may be represented as a neural network processor NPU 1500, and the NPU 1500 is mounted as a coprocessor on a main CPU (Host CPU), and the Host CPU distributes tasks. The core part of the NPU is an operation circuit 1503, and the controller 1504 controls the operation circuit 1503 to extract matrix data in the memory and perform multiplication.

In some implementations, the arithmetic circuit 1503 includes a plurality of processing units (PEs) inside. In some implementations, the operation circuit 1503 is a two-dimensional systolic array. The operation circuit 1503 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the operation circuit 1503 is a general-purpose matrix processor.

For example, assume that there is an input matrix a, a weight matrix B, and an output matrix C. The arithmetic circuit takes the data corresponding to matrix B from the weight memory 1502 and buffers it on each PE in the arithmetic circuit. The arithmetic circuit takes matrix a data from the input memory 1501 and performs matrix operation with matrix B, and the obtained partial result or final result of the matrix is stored in an accumulator (accumulator) 1508.

Unified memory 1506 is used to store input data and output data. The weight data is carried directly to the weight memory 1502 through the memory cell access controller (Direct Memory Access Controller, DMAC) 1505. The input data is also carried into the unified memory 1506 through the DMAC.

BIU is Bus Interface Unit, i.e., bus interface unit (Bus Interface Unit, BIU) 1510 for interaction of the AXI bus with the DMAC and instruction fetch memory (Instruction Fetch Buffer, IFB) 1509.

The bus interface unit 1510 is configured to fetch the instruction from the external memory by the instruction fetch memory 1509, and further configured to fetch the raw data of the input matrix a or the weight matrix B from the external memory by the memory unit access controller 1505.

The DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 1506 or to transfer weight data to the weight memory 1502 or to transfer input data to the input memory 1501.

The vector calculation unit 1507 includes a plurality of operation processing units, and further processes such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like are performed on the output of the operation circuit 1503 if necessary. The method is mainly used for non-convolution/full-connection layer network calculation in the neural network, such as Batch Normalization (batch normalization), pixel-level summation, up-sampling of a characteristic plane and the like.

In some implementations, the vector computation unit 1507 can store the vector of processed outputs to the unified memory 1506. For example, the vector calculation unit 1507 may apply a linear function, or a nonlinear function to the output of the operation circuit 1503, for example, to linearly interpolate the feature plane extracted by the convolution layer, and then, for example, to accumulate the vector of values to generate the activation value. In some implementations, the vector calculation unit 1507 generates normalized values, pixel-level summed values, or both. In some implementations, the vector of processed outputs can be used as an activation input to the arithmetic circuit 1503, for example for use in subsequent layers in a neural network.

An instruction fetch memory (instruction fetch buffer) 1509 connected to the controller 1504 for storing instructions used by the controller 1504;

the unified memory 1506, the input memory 1501, the weight memory 1502 and the finger memory 1509 are all On-Chip memories. The external memory is proprietary to the NPU hardware architecture.

The processor mentioned in any of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the above-mentioned programs.

It should be further noted that the above-described apparatus embodiments are merely illustrative, and that the units described as separate units may or may not be physically separate, and that units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the application, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines.

From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general purpose hardware, or of course by means of special purpose hardware including application specific integrated circuits, special purpose CPUs, special purpose memories, special purpose components, etc. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions can be varied, such as analog circuits, digital circuits, or dedicated circuits. But a software program implementation is a preferred embodiment for many more of the cases of the present application. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk or an optical disk of a computer, etc., comprising several instructions for causing a computer device (which may be a personal computer, a training device, a network device, etc.) to perform the method according to the embodiments of the present application.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, training device, or data center to another website, computer, training device, or data center via a wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a training device, a data center, or the like that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk (Solid STATE DISK, SSD)), etc.

Claims

1. A method of model training, comprising:

Obtaining a sample pair, wherein the sample pair comprises noise data and noise-free data corresponding to the noise data, the noise data comprises sparse time sequence data, and the sparse time sequence data comprises skeleton point coordinate data, electrocardiogram data, inertial measurement unit data or fault diagnosis data;

Inputting the noiseless data into a noise reduction classification network to obtain first output data and second output data, wherein the noise reduction classification network comprises a noise reduction network and a classification network, the first output data is the output of the noise reduction network, and the second output data is the output of the classification network;

Inputting the noise data into the noise reduction classification network to obtain third output data and fourth output data, wherein the third output data is obtained based on an intermediate layer of the noise reduction network, and the fourth output data is output of the classification network;

Determining a first loss function from the first output data and the third output data, the first loss function being indicative of a difference between the first output data and the third output data;

Determining a second loss function according to the second output data and the fourth output data, wherein the second loss function is used for representing differences between the second output data and real class labels of the fourth output data and the noiseless data;

Training the noise reduction classification network at least according to the first loss function and the second loss function until a preset training condition is met, and obtaining a target network.

2. The method of claim 1, wherein the noise reduction network comprises an encoder for compression encoding input data and a decoder for data reconstruction of data output by the encoder;

The first output data is an output of the encoder, and the third output data is obtained based on an intermediate layer of the encoder.

3. The method according to claim 1 or 2, wherein inputting the noise data into the noise reduction classification network results in third output data, comprising:

Inputting the noise data into the noise reduction classification network to obtain the characteristic data output by the middle layer of the noise reduction network;

dividing the characteristic data into a plurality of sub-characteristic data to obtain the third output data, wherein the third output data comprises the plurality of sub-characteristic data;

Determining a first loss function from the first output data and the third output data, comprising:

determining a difference value between each piece of sub-feature data in the third output data and the first output data;

and determining the first loss function according to the difference value between each piece of sub-characteristic data and the first output data.

4. A method according to claim 3, wherein said dividing the characteristic data into a plurality of sub-characteristic data to obtain the third output data comprises:

uniformly dividing the characteristic data into a plurality of sub-characteristic data according to a time sequence to obtain the third output data, wherein the length of a time period corresponding to each sub-characteristic data in the plurality of sub-characteristic data is the same;

wherein the noise data is time sequence data.

5. A method according to claim 3, wherein said determining a difference value between each sub-feature data in the third output data and the first output data comprises:

performing dimension alignment operation on each piece of sub-feature data in the first output data and the third output data respectively to obtain first output data and third output data with aligned dimensions;

a difference value between each sub-feature data in the third dimension-aligned output data and the first dimension-aligned output data is determined.

6. The method according to any of claims 1-2, wherein said determining a second loss function from said second output data and said fourth output data comprises:

determining the difference between the second output data and the real class label of the noiseless data to obtain a first difference value;

determining the difference between the fourth output data and the real class label of the noiseless data to obtain a second difference value;

acquiring the second loss function according to the first difference value and the second difference value;

The second output data is a multi-classification prediction result and is used for representing a result predicted by the classification network.

7. The method according to any one of claims 1-2, wherein the method further comprises:

Acquiring a first feature, and predicting a classification result corresponding to the noiseless data according to the first feature to obtain a first prediction result, wherein the first feature is extracted by a classification network in a noise reduction classification network after the noiseless data is input into the noise reduction classification network;

Obtaining a second characteristic, and predicting a classification result corresponding to the noise data according to the second characteristic to obtain a second prediction result, wherein the second characteristic is extracted by a classification network in a noise reduction classification network after the noise data is input into the noise reduction classification network;

Determining a third loss function according to the first prediction result, the real two-class label of the noiseless data, the second prediction result and the real two-class label of the noise data;

said training said noise reduction classification network based at least on said first and second loss functions, comprising:

Training the noise reduction classification network based at least on the first, second, and third loss functions;

The classification result corresponding to the noise-free data is a noise-free type or a noise type, and the classification result corresponding to the noise data is a noise-free type or a noise type.

8. The method according to any of claims 1-2, wherein said training the noise reduction classification network based at least on the first and second loss functions comprises:

and updating parameters of the noise reduction classification network through an error back propagation algorithm at least according to the first loss function and the second loss function.

9. A terminal comprising a memory and a processor, the memory storing code, the processor configured to execute the code, the terminal performing the method of any of claims 1 to 8 when the code is executed.

10. A computer readable storage medium comprising computer readable instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 8.

11. A computer program product comprising computer readable instructions which, when run on a computer, cause the computer to perform the method of any of claims 1 to 8.