+

CN119151067A - System performance prediction model training method, system performance prediction method and device - Google Patents

System performance prediction model training method, system performance prediction method and device Download PDF

Info

Publication number
CN119151067A
CN119151067A CN202411312676.4A CN202411312676A CN119151067A CN 119151067 A CN119151067 A CN 119151067A CN 202411312676 A CN202411312676 A CN 202411312676A CN 119151067 A CN119151067 A CN 119151067A
Authority
CN
China
Prior art keywords
system performance
performance prediction
cluster
data
prediction model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202411312676.4A
Other languages
Chinese (zh)
Other versions
CN119151067B (en
Inventor
张君友
聂延凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Aientropy Technology Co ltd
Original Assignee
Beijing Aientropy Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Aientropy Technology Co ltd filed Critical Beijing Aientropy Technology Co ltd
Priority to CN202411312676.4A priority Critical patent/CN119151067B/en
Publication of CN119151067A publication Critical patent/CN119151067A/en
Application granted granted Critical
Publication of CN119151067B publication Critical patent/CN119151067B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computational Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Pure & Applied Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Databases & Information Systems (AREA)
  • Development Economics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Game Theory and Decision Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Algebra (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)

Abstract

The application discloses a system performance prediction model training method, a system performance prediction method and a system performance prediction device, and relates to the technical field of artificial intelligence and the technical field of computing power cluster system performance prediction, wherein the system performance prediction method comprises the steps of obtaining sample characteristic data for training a system performance prediction model, wherein the sample characteristic data comprises cluster characteristic data of a known computing power cluster, and system performance data obtained by performing benchmark test on the known computing power cluster; inputting the cluster characteristic data into a system performance prediction model to obtain output system performance prediction data, wherein a model structure of the system performance prediction model is provided with a residual block stacking layer, determining whether the model training meets a convergence condition or not based on the system performance prediction data and the corresponding system performance data, and if the model training does not meet the convergence condition, adjusting model parameters of the system performance prediction model and executing the next model training. By adopting the method, the system performance of the computing power cluster is accurately predicted.

Description

System performance prediction model training method, system performance prediction method and device
Technical Field
The application relates to the technical field of artificial intelligence and the technical field of performance prediction of computing power cluster systems, in particular to a system performance prediction model training method, a system performance prediction method and a system performance prediction device.
Background
With the development of artificial intelligence, especially the high-speed development of large model fields, the system has higher and higher requirements on computing power, thousands of processors and accelerator cards are usually needed for building a computing center and various high-speed buses are matched, and the overall performance of the system is affected by various factors such as processors, accelerator cards, communication modes, model algorithms and scheduling algorithms. If the system performance and the facility investment cost cannot be fully estimated in the early stage of project construction, the situations such as calculation power waste and calculation power shortage are easy to be caused, and obviously, the situations are not willing to see.
How to be able to combine the historical test data analysis according to the actual application demands, recommend reasonable software and hardware facility combination, ensure the cost and performance of the final system to be optimal, and become the topics of interest in the industry.
Disclosure of Invention
The embodiment of the application provides a system performance prediction model training method, a system performance prediction method and a system performance prediction device, which are used for solving the problem of how to accurately predict the system performance of a computing power cluster in the prior art.
The embodiment of the application provides a system performance prediction model training method, which comprises the following steps:
acquiring sample feature data for training a system performance prediction model, wherein the sample feature data comprises cluster feature data of a known calculation power cluster and system performance data obtained by performing benchmark test on the known calculation power cluster;
Inputting the cluster characteristic data into the system performance prediction model to obtain output system performance prediction data, wherein a model structure of the system performance prediction model is provided with a residual block stacking layer, and the residual block stacking layer comprises residual blocks;
Determining whether the model training meets a convergence condition or not based on the system performance prediction data and the corresponding system performance data;
And if the convergence condition is met, determining that the training of the system performance prediction model is completed, and if the convergence condition is not met, adjusting model parameters of the system performance prediction model and executing the next model training.
Further, the cluster characteristic data comprises cluster quantitative characteristic data and cluster qualitative characteristic data;
the residual block stacking layer comprises a plurality of residual blocks connected in series;
Each of the residual blocks has two inputs and one output;
the inputs of the plurality of residual blocks each comprise a qualitative feature vector representing the cluster qualitative feature data;
The other input of the first residual block is a quantitative feature vector representing the cluster quantitative feature data, the other input of the residual blocks except the first one is the output of the connected previous residual block, and the output of the last residual block is taken as the output of the residual block stacking layer.
Further, the operations performed in the residual block include the following operations:
Multiplying the qualitative feature vector by an association matrix to obtain an association vector;
adding the association vector to the quantitative feature vector or the output of the previous residual block to obtain a combined feature vector;
multiplying the combined feature vector by a residual matrix to obtain a residual vector;
Adding the residual vector and the quantitative feature vector or the output of the previous residual block to obtain a jump connection feature vector;
normalizing the jump connection feature vector to obtain the output of the residual block;
The correlation matrix and the residual matrix are used as model parameters of the system performance prediction model.
Further, the model structure of the system performance prediction model is provided with an input layer and a characteristic preprocessing layer;
The input layer is used for receiving the cluster quantitative characteristic data and the cluster qualitative characteristic data;
The feature preprocessing layer is used for preprocessing the cluster quantitative feature data through a multi-layer perceptron MLP network to obtain quantitative feature vectors, and generating qualitative feature vectors corresponding to the cluster qualitative feature data by adopting a label searching code mode.
Further, the model structure of the system performance prediction model is provided with a summarizing layer and an output layer;
The summarizing layer is used for processing the output of the residual block stacking layer through an MLP network to obtain the system performance prediction data;
the output layer is used for outputting the system performance prediction data.
The embodiment of the application also provides a method for predicting the performance of the computing power cluster system, which comprises the following steps:
acquiring cluster characteristic data of a computing power cluster to be predicted;
based on the cluster characteristic data, a system performance prediction model obtained by training by adopting any system performance prediction model training method is adopted to predict the system performance of the computing power cluster to be predicted, so as to obtain system performance prediction data.
The embodiment of the application also provides a system performance prediction model training device, which comprises:
the system comprises a sample data acquisition module, a system performance test module and a system performance test module, wherein the sample data acquisition module is used for acquiring sample characteristic data for training a system performance prediction model, and the sample characteristic data comprises cluster characteristic data of a known computing power cluster and system performance data obtained by performing a reference test on the known computing power cluster;
The system performance prediction module is used for inputting the cluster characteristic data into the system performance prediction model to obtain output system performance prediction data, wherein a model structure of the system performance prediction model is provided with a residual block stacking layer, and the residual block stacking layer comprises a residual block;
the convergence judging module is used for determining whether the model training meets a convergence condition or not based on the system performance prediction data and the corresponding system performance data;
And the model training module is used for determining that the training of the system performance prediction model is completed if the convergence condition is met, adjusting the model parameters of the system performance prediction model if the convergence condition is not met, and executing the next model training.
Further, the cluster characteristic data comprises cluster quantitative characteristic data and cluster qualitative characteristic data;
the residual block stacking layer comprises a plurality of residual blocks connected in series;
Each of the residual blocks has two inputs and one output;
the inputs of the plurality of residual blocks each comprise a qualitative feature vector representing the cluster qualitative feature data;
The other input of the first residual block is a quantitative feature vector representing the cluster quantitative feature data, the other input of the residual blocks except the first one is the output of the connected previous residual block, and the output of the last residual block is taken as the output of the residual block stacking layer.
Further, the operations performed in the residual block include the following operations:
Multiplying the qualitative feature vector by an association matrix to obtain an association vector;
adding the association vector to the quantitative feature vector or the output of the previous residual block to obtain a combined feature vector;
multiplying the combined feature vector by a residual matrix to obtain a residual vector;
Adding the residual vector and the quantitative feature vector or the output of the previous residual block to obtain a jump connection feature vector;
normalizing the jump connection feature vector to obtain the output of the residual block;
The correlation matrix and the residual matrix are used as model parameters of the system performance prediction model.
Further, the model structure of the system performance prediction model is provided with an input layer and a characteristic preprocessing layer;
The input layer is used for receiving the cluster quantitative characteristic data and the cluster qualitative characteristic data;
The feature preprocessing layer is used for preprocessing the cluster quantitative feature data through a multi-layer perceptron MLP network to obtain quantitative feature vectors, and generating qualitative feature vectors corresponding to the cluster qualitative feature data by adopting a label searching code mode.
Further, the model structure of the system performance prediction model is provided with a summarizing layer and an output layer;
The summarizing layer is used for processing the output of the residual block stacking layer through an MLP network to obtain the system performance prediction data;
the output layer is used for outputting the system performance prediction data.
The embodiment of the application also provides a device for predicting the performance of the computing power cluster system, which comprises the following steps:
the cluster data acquisition module is used for acquiring cluster characteristic data of the computing power cluster to be predicted;
and the system performance prediction module is used for predicting the system performance of the computing power cluster to be predicted by adopting the system performance prediction model obtained by training the system performance prediction model training device based on the cluster characteristic data to obtain system performance prediction data.
The embodiment of the application also provides electronic equipment, which comprises a processor and a machine-readable storage medium, wherein the machine-readable storage medium stores machine-executable instructions capable of being executed by the processor, and the processor is caused by the machine-executable instructions to realize any one of the system performance prediction model training methods or the computing power cluster system performance prediction method.
The embodiment of the application also provides a computer readable storage medium, which is characterized in that a computer program is stored in the computer readable storage medium, and the computer program realizes any one of the system performance prediction model training methods or realizes the computing power cluster system performance prediction method when being executed by a processor.
The embodiment of the application also provides a computer program product containing instructions, which when run on a computer, cause the computer to execute any one of the system performance prediction model training methods or the computing power cluster system performance prediction method.
The beneficial effects of the application include:
In the method provided by the embodiment of the application, cluster characteristic data of a known computing power cluster is used as input of a system performance prediction model, the system performance prediction data output by the system performance prediction model is compared with the system performance data obtained by carrying out reference test on the known computing power cluster, whether the model training meets the convergence condition is determined, the trained system performance prediction model is obtained by repeated iterative training until the convergence condition is met, and the model structure of the system performance prediction model is provided with a residual block stacking layer which comprises a residual block, so that the system performance of the computing power cluster to be predicted can be accurately predicted by using the system performance prediction model based on the cluster characteristic data of the computing power cluster to be predicted.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
Drawings
The accompanying drawings are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate the application and together with the embodiments of the application, serve to explain the application. In the drawings:
FIG. 1 is a flowchart of a system performance prediction model training method provided by an embodiment of the present application;
FIG. 2 is a flowchart of a method for predicting performance of a computing power cluster system according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a system performance prediction model according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a residual block stack layer of a system performance prediction model according to an embodiment of the present application;
Fig. 5 is a schematic structural diagram of a residual block stacking layer according to an embodiment of the present application;
FIG. 6 is a schematic structural diagram of a training device for a system performance prediction model according to an embodiment of the present application;
FIG. 7 is a schematic structural diagram of a performance prediction apparatus for a computing power cluster system according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to provide an implementation scheme for accurately predicting the system performance of the computing power cluster, the embodiment of the application provides a system performance prediction model training method, a system performance prediction method and a system performance prediction device, and the following description is made with reference to the accompanying drawings of the specification, which should be understood that the preferred embodiments described herein are only for illustrating and explaining the application, and are not limiting to the application. And embodiments of the application and features of the embodiments may be combined with each other without conflict.
The embodiment of the application provides a system performance prediction model training method, as shown in fig. 1, comprising the following steps:
step 11, acquiring sample feature data for training a system performance prediction model, wherein the sample feature data comprises cluster feature data of a known calculation power cluster and system performance data obtained by performing benchmark test on the known calculation power cluster;
Step 12, inputting the cluster characteristic data into a system performance prediction model to obtain output system performance prediction data, wherein a model structure of the system performance prediction model is provided with a residual block stacking layer, and the residual block stacking layer comprises a residual block;
step 13, determining whether the model training meets a convergence condition or not based on the system performance prediction data and the corresponding system performance data;
and 14, if the convergence condition is met, determining that the training of the system performance prediction model is completed, and if the convergence condition is not met, adjusting model parameters of the system performance prediction model and executing the next model training.
Correspondingly, the embodiment of the application also provides a method for predicting the performance of the computing power cluster system, which is shown in fig. 2 and comprises the following steps:
21. acquiring cluster characteristic data of a computing power cluster to be predicted;
22. based on the cluster characteristic data, a system performance prediction model obtained by training the system performance prediction model training method is adopted to predict the system performance of the computing power cluster to be predicted, and system performance prediction data is obtained.
By adopting the method provided by the embodiment of the application, the cluster characteristic data of the known calculation force cluster is used as the input of the system performance prediction model, the system performance prediction data output by the system performance prediction model is compared with the system performance data obtained by carrying out the reference test on the known calculation force cluster, whether the model training meets the convergence condition is determined, the trained system performance prediction model is obtained by repeated iterative training until the convergence condition is met, and the model structure of the system performance prediction model is provided with a residual block stacking layer which comprises a residual block, so that the system performance of the calculation force cluster to be predicted can be accurately predicted by using the system performance prediction model based on the cluster characteristic data of the calculation force cluster to be predicted.
The method and apparatus provided by the present application will now be described in detail with particular embodiments thereof, with reference to the accompanying drawings.
In the embodiment of the application, the model structure of the system performance prediction model is provided with a residual block stacking layer, as shown in fig. 3, and also can be provided with an input layer, a characteristic preprocessing layer, a summarizing layer and an output layer, wherein the connection relation among the layers is shown in fig. 3, and the input layer, the characteristic preprocessing layer, the residual block stacking layer, the summarizing layer and the output layer are sequentially connected.
In the embodiment of the application, the cluster characteristic data of the computing power cluster input as the model can comprise cluster quantitative characteristic data and cluster qualitative characteristic data;
the cluster quantitative characteristic data specifically comprises the number of processors, the number of accelerator cards, the size of a memory, a main frequency and the like, and is generally data of a numerical value type;
the cluster qualitative feature data is feature data which cannot be quantified through numerical values, and specifically can comprise a processor model, an accelerator card model, a chip architecture, a manufacturing process, an instruction set type and the like, and is generally an enumeration type.
In one embodiment of the application, the input layer is configured to receive cluster quantitative feature data and cluster qualitative feature data;
the feature preprocessing layer is used for preprocessing the cluster quantitative feature data output by the input layer through the MLP network to obtain quantitative feature vectors, specifically, all the cluster quantitative feature data can be combined into an initial numerical vector, the vector length of the numerical vector is determined by the quantity of quantitative indexes actually selected, and then the numerical vector is preprocessed through the MLP network to obtain a 1*n quantitative feature vector;
The feature preprocessing layer is further used for generating qualitative feature vectors corresponding to the qualitative feature data of the clusters by adopting a tag searching code mode, specifically, the tag searching code mode is adopted to directly allocate corresponding numerical values to the types represented by each kind of the qualitative feature data of the clusters to obtain a 1*n tag vector, and random allocation can be started, so long as the fact that the follow-up allocated numerical values of the same type are consistent is ensured, the tag vector is the qualitative feature vector.
And obtaining the input of the residual block stacking layer, namely the cluster quantitative feature vector and the cluster qualitative feature vector, through the input layer and the feature preprocessing layer.
In one embodiment of the present application, the residual block stacking layer may include one residual block, as shown in fig. 4, and the residual block stacking layer may also include a plurality of serially connected residual blocks, and each residual block has two inputs and one output, the inputs of the plurality of residual blocks each include a qualitative feature vector representing qualitative feature data of the cluster, the other input of the first residual block is a quantitative feature vector representing quantitative feature data of the cluster, the other input of the other residual blocks except the first is an output of the connected previous residual block, and the output of the last residual block is an output of the residual block stacking layer.
Further, as shown in fig. 5, the operations performed in the residual block include the following operations:
Multiplying the qualitative feature vector by an incidence matrix to obtain an incidence vector, wherein the incidence matrix is an n-n matrix, and the obtained incidence vector is a 1*n vector;
Adding the correlation vector to the quantitative feature vector or the output of the previous residual block to obtain a combined feature vector, wherein in the operation, if the residual block is the first residual block, the correlation vector is added to the quantitative feature vector, and if the residual block is other residual blocks except the first residual block, the correlation vector is added to the output of the previous residual block, and the obtained combined feature vector is a vector of 1*n, so that the combination of the quantitative feature and the qualitative feature is realized;
Multiplying the combined feature vector by a residual matrix to obtain a residual vector, wherein the residual matrix is a matrix of n, and the obtained residual vector is a vector 1*n;
Adding the residual vector to the output of the quantitative feature vector or the previous residual block to obtain a jump connection feature vector, wherein in the operation, if the residual block is the first residual block, the jump connection feature vector is added to the quantitative feature vector, and if the residual block is other residual blocks except the first residual block, the jump connection feature vector is 1*n, so that the jump connection of the vector residual vector and the output of the quantitative feature vector or the previous residual block is realized;
normalizing the jump connection feature vector to obtain the output of the residual block;
The correlation matrix and the residual matrix are used as model parameters of a system performance prediction model, and are adjusted through continuous iteration in the training process of the model.
In one embodiment of the present application, as shown in FIG. 3, the model structure of the system performance prediction model has a summary layer and an output layer;
the summarizing layer is used for processing the output of the residual block stacking layer through the MLP network to obtain system performance prediction data, specifically, n-dimensional feature vectors can be processed into m-dimensional performance index vectors, m is the number of representing system performance indexes, and the performance indexes can be temperature, power consumption, throughput, reasoning time delay, reasoning accuracy and the like;
the output layer is used for outputting the system performance prediction data, and the output layer can be one or more neurons and corresponds to each system performance index to be output.
In the embodiment of the application, in the model training process, after the output system performance prediction data is obtained, whether the model training meets the convergence condition or not is determined based on the system performance prediction data and the corresponding system performance data obtained by the reference test;
In one embodiment of the application, the model may be trained using Mean Square Error (MSE) as a loss function, i.e., determining whether the model training converges, as follows:
l=1/N Σ (y_pred-y_true)/(2), where y_pred is system performance prediction data output by the model, y_true is actual system performance data obtained by reference test, and N is the number of samples;
Correspondingly, if the convergence condition is met, determining that the training of the system performance prediction model is completed, and if the convergence condition is not met, adjusting model parameters of the system performance prediction model and executing the next model training;
In particular, gradient descent or variants thereof (e.g., adam optimizers) may be used to minimize the loss function and update the model parameters, mainly including parameters of the MLP network of the feature pre-processing layer, residual block stack layer residual and correlation matrices, and parameters of the MLP network of the summary layer.
Based on the same inventive concept, according to the system performance prediction model training method provided by the above embodiment of the present application, correspondingly, another embodiment of the present application further provides a system performance prediction model training device, a structural schematic diagram of which is shown in fig. 6, which specifically includes:
A sample data obtaining module 61, configured to obtain sample feature data for training a system performance prediction model, where the sample feature data includes cluster feature data of a known computing power cluster, and system performance data obtained by performing a benchmark test on the known computing power cluster;
a system performance prediction module 62, configured to input the cluster feature data into the system performance prediction model to obtain output system performance prediction data, where a model structure of the system performance prediction model has a residual block stack layer, and the residual block stack layer includes a residual block;
a convergence judging module 63, configured to determine whether the current model training meets a convergence condition based on the system performance prediction data and the corresponding system performance data;
the model training module 64 is configured to determine that the training of the system performance prediction model is completed if the convergence condition is satisfied, adjust model parameters of the system performance prediction model if the convergence condition is not satisfied, and perform the next model training.
Further, the cluster characteristic data comprises cluster quantitative characteristic data and cluster qualitative characteristic data;
the residual block stacking layer comprises a plurality of residual blocks connected in series;
Each of the residual blocks has two inputs and one output;
the inputs of the plurality of residual blocks each comprise a qualitative feature vector representing the cluster qualitative feature data;
The other input of the first residual block is a quantitative feature vector representing the cluster quantitative feature data, the other input of the residual blocks except the first one is the output of the connected previous residual block, and the output of the last residual block is taken as the output of the residual block stacking layer.
Further, the operations performed in the residual block include the following operations:
Multiplying the qualitative feature vector by an association matrix to obtain an association vector;
adding the association vector to the quantitative feature vector or the output of the previous residual block to obtain a combined feature vector;
multiplying the combined feature vector by a residual matrix to obtain a residual vector;
Adding the residual vector and the quantitative feature vector or the output of the previous residual block to obtain a jump connection feature vector;
normalizing the jump connection feature vector to obtain the output of the residual block;
The correlation matrix and the residual matrix are used as model parameters of the system performance prediction model.
Further, the model structure of the system performance prediction model is provided with an input layer and a characteristic preprocessing layer;
The input layer is used for receiving the cluster quantitative characteristic data and the cluster qualitative characteristic data;
The feature preprocessing layer is used for preprocessing the cluster quantitative feature data through a multi-layer perceptron MLP network to obtain quantitative feature vectors, and generating qualitative feature vectors corresponding to the cluster qualitative feature data by adopting a label searching code mode.
Further, the model structure of the system performance prediction model is provided with a summarizing layer and an output layer;
The summarizing layer is used for processing the output of the residual block stacking layer through an MLP network to obtain the system performance prediction data;
the output layer is used for outputting the system performance prediction data.
Based on the same inventive concept, according to the method for predicting performance of a computing power cluster system provided by the above embodiment of the present application, correspondingly, another embodiment of the present application further provides a device for predicting performance of a computing power cluster system, where a schematic structural diagram of the device is shown in fig. 7, and the device specifically includes:
A cluster data acquisition module 71, configured to acquire cluster feature data of a computing power cluster to be predicted;
The system performance prediction module 72 is configured to predict the system performance of the to-be-predicted computing power cluster by using the system performance prediction model obtained by training the system performance prediction model training device based on the cluster feature data, so as to obtain system performance prediction data.
The functions of the above modules may correspond to corresponding processing steps in the flow shown in fig. 1 and 2, and are not described herein.
The system performance prediction model training device and the computing power cluster system performance prediction device provided by the embodiment of the application can be realized through a computer program. It should be understood by those skilled in the art that the above-mentioned module division manner is only one of many module division manners, and if the module division manner is divided into other modules or not, it is within the scope of the present application as long as the system performance prediction model training device and the computing power cluster system performance prediction device have the above-mentioned functions.
An embodiment of the present application further provides an electronic device, as shown in fig. 8, including a processor 81 and a machine-readable storage medium 82, where the machine-readable storage medium 82 stores machine-executable instructions capable of being executed by the processor 81, and the processor 81 is caused by the machine-executable instructions to implement any one of the above-mentioned system performance prediction model training methods, or implement an computing power cluster system performance prediction method.
The embodiment of the application also provides a computer readable storage medium, which is characterized in that a computer program is stored in the computer readable storage medium, and the computer program realizes any one of the system performance prediction model training methods or realizes the computing power cluster system performance prediction method when being executed by a processor.
The embodiment of the application also provides a computer program product containing instructions, which when run on a computer, cause the computer to execute any one of the system performance prediction model training methods or the computing power cluster system performance prediction method.
The machine-readable storage medium in the electronic device may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor including a central Processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc., or may be a digital signal processor (DIGITAL SIGNAL Processing, DSP), application Specific Integrated Circuit (ASIC), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for an apparatus, an electronic device, a computer readable storage medium, a computer program product embodiment, since it is substantially similar to a method embodiment, the description is relatively simple, and reference is made to the section description of a method embodiment for relevant points.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (10)

1. A system performance prediction model training method, comprising:
acquiring sample feature data for training a system performance prediction model, wherein the sample feature data comprises cluster feature data of a known calculation power cluster and system performance data obtained by performing benchmark test on the known calculation power cluster;
Inputting the cluster characteristic data into the system performance prediction model to obtain output system performance prediction data, wherein a model structure of the system performance prediction model is provided with a residual block stacking layer, and the residual block stacking layer comprises residual blocks;
Determining whether the model training meets a convergence condition or not based on the system performance prediction data and the corresponding system performance data;
And if the convergence condition is met, determining that the training of the system performance prediction model is completed, and if the convergence condition is not met, adjusting model parameters of the system performance prediction model and executing the next model training.
2. The method of claim 1, wherein the cluster characterization data comprises cluster quantitative characterization data and cluster qualitative characterization data;
the residual block stacking layer comprises a plurality of residual blocks connected in series;
Each of the residual blocks has two inputs and one output;
the inputs of the plurality of residual blocks each comprise a qualitative feature vector representing the cluster qualitative feature data;
The other input of the first residual block is a quantitative feature vector representing the cluster quantitative feature data, the other input of the residual blocks except the first one is the output of the connected previous residual block, and the output of the last residual block is taken as the output of the residual block stacking layer.
3. The method of claim 2, wherein the operations performed in the residual block comprise the operations of:
Multiplying the qualitative feature vector by an association matrix to obtain an association vector;
adding the association vector to the quantitative feature vector or the output of the previous residual block to obtain a combined feature vector;
multiplying the combined feature vector by a residual matrix to obtain a residual vector;
Adding the residual vector and the quantitative feature vector or the output of the previous residual block to obtain a jump connection feature vector;
normalizing the jump connection feature vector to obtain the output of the residual block;
The correlation matrix and the residual matrix are used as model parameters of the system performance prediction model.
4. The method of claim 2, wherein the model structure of the system performance prediction model has an input layer and a feature pre-processing layer;
The input layer is used for receiving the cluster quantitative characteristic data and the cluster qualitative characteristic data;
The feature preprocessing layer is used for preprocessing the cluster quantitative feature data through a multi-layer perceptron MLP network to obtain quantitative feature vectors, and generating qualitative feature vectors corresponding to the cluster qualitative feature data by adopting a label searching code mode.
5. The method of claim 2, wherein the model structure of the system performance prediction model has a summary layer and an output layer;
The summarizing layer is used for processing the output of the residual block stacking layer through an MLP network to obtain the system performance prediction data;
the output layer is used for outputting the system performance prediction data.
6. A method for predicting performance of a computing power cluster system, comprising:
acquiring cluster characteristic data of a computing power cluster to be predicted;
Based on the cluster characteristic data, the system performance of the computing power cluster to be predicted is predicted by adopting the system performance prediction model obtained by training the method according to any one of claims 1-5, so as to obtain system performance prediction data.
7. A system performance prediction model training apparatus, comprising:
the system comprises a sample data acquisition module, a system performance test module and a system performance test module, wherein the sample data acquisition module is used for acquiring sample characteristic data for training a system performance prediction model, and the sample characteristic data comprises cluster characteristic data of a known computing power cluster and system performance data obtained by performing a reference test on the known computing power cluster;
The system performance prediction module is used for inputting the cluster characteristic data into the system performance prediction model to obtain output system performance prediction data, wherein a model structure of the system performance prediction model is provided with a residual block stacking layer, and the residual block stacking layer comprises a residual block;
the convergence judging module is used for determining whether the model training meets a convergence condition or not based on the system performance prediction data and the corresponding system performance data;
And the model training module is used for determining that the training of the system performance prediction model is completed if the convergence condition is met, adjusting the model parameters of the system performance prediction model if the convergence condition is not met, and executing the next model training.
8. A computing power cluster system performance prediction apparatus, comprising:
the cluster data acquisition module is used for acquiring cluster characteristic data of the computing power cluster to be predicted;
And the system performance prediction module is used for predicting the system performance of the computing power cluster to be predicted by adopting the system performance prediction model obtained by training the device of claim 7 based on the cluster characteristic data to obtain system performance prediction data.
9. An electronic device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor to cause the processor to perform the method of any one of claims 1-5 or to perform the method of claim 6.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when executed by a processor, implements the method of any of claims 1-5 or implements the method of claim 6.
CN202411312676.4A 2024-09-20 2024-09-20 System performance prediction model training method, system performance prediction method and device Active CN119151067B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411312676.4A CN119151067B (en) 2024-09-20 2024-09-20 System performance prediction model training method, system performance prediction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411312676.4A CN119151067B (en) 2024-09-20 2024-09-20 System performance prediction model training method, system performance prediction method and device

Publications (2)

Publication Number Publication Date
CN119151067A true CN119151067A (en) 2024-12-17
CN119151067B CN119151067B (en) 2025-09-30

Family

ID=93809314

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411312676.4A Active CN119151067B (en) 2024-09-20 2024-09-20 System performance prediction model training method, system performance prediction method and device

Country Status (1)

Country Link
CN (1) CN119151067B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111931054A (en) * 2020-08-14 2020-11-13 中国科学院深圳先进技术研究院 A sequence recommendation method and system based on improved residual structure
CN113962421A (en) * 2020-12-23 2022-01-21 赵庆林 LSTM-based method and device for predicting computing power of the entire blockchain network, and storage medium
CN116610534A (en) * 2023-07-18 2023-08-18 贵州海誉科技股份有限公司 Improved predictive elastic telescoping method based on Kubernetes cluster resources
US20240104351A1 (en) * 2022-09-26 2024-03-28 Alipay (Hangzhou) Information Technology Co., Ltd. Prediction methods and apparatuses for elastically adjusting computing power
CN117785692A (en) * 2023-12-26 2024-03-29 河北工程大学 Spark performance prediction method, model training method, device, equipment and medium
CN117875390A (en) * 2023-12-06 2024-04-12 超聚变数字技术有限公司 Hard disk capacity prediction method, training method of capacity prediction model and computing equipment
WO2024140759A1 (en) * 2022-12-27 2024-07-04 华为技术有限公司 Battery performance prediction method, model training method, and related apparatuses
CN118378681A (en) * 2024-04-22 2024-07-23 中国联合网络通信集团有限公司 Training method, device, electronic device and storage medium for load prediction model

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111931054A (en) * 2020-08-14 2020-11-13 中国科学院深圳先进技术研究院 A sequence recommendation method and system based on improved residual structure
CN113962421A (en) * 2020-12-23 2022-01-21 赵庆林 LSTM-based method and device for predicting computing power of the entire blockchain network, and storage medium
US20240104351A1 (en) * 2022-09-26 2024-03-28 Alipay (Hangzhou) Information Technology Co., Ltd. Prediction methods and apparatuses for elastically adjusting computing power
WO2024140759A1 (en) * 2022-12-27 2024-07-04 华为技术有限公司 Battery performance prediction method, model training method, and related apparatuses
CN116610534A (en) * 2023-07-18 2023-08-18 贵州海誉科技股份有限公司 Improved predictive elastic telescoping method based on Kubernetes cluster resources
CN117875390A (en) * 2023-12-06 2024-04-12 超聚变数字技术有限公司 Hard disk capacity prediction method, training method of capacity prediction model and computing equipment
CN117785692A (en) * 2023-12-26 2024-03-29 河北工程大学 Spark performance prediction method, model training method, device, equipment and medium
CN118378681A (en) * 2024-04-22 2024-07-23 中国联合网络通信集团有限公司 Training method, device, electronic device and storage medium for load prediction model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
郑晨宇: "基于Apache Snark平台的大数据作业性能预测", 万方学位论文库, 15 September 2019 (2019-09-15) *
马健钦;: "面向应用性能管理系统的运行负载预测", 计算机测量与控制, no. 11, 25 November 2018 (2018-11-25) *

Also Published As

Publication number Publication date
CN119151067B (en) 2025-09-30

Similar Documents

Publication Publication Date Title
US11650968B2 (en) Systems and methods for predictive early stopping in neural network training
CN116594748B (en) Model customization processing method, device, equipment and medium for task
CN119336457A (en) A large model scheduling method and device based on NPU computing power
CN119473595A (en) GPU resource scheduling optimization method, device and medium based on hybrid model
CN119204360B (en) Heterogeneous computing system and training time prediction method, device, medium and product thereof
CN106897199B (en) Batch job execution time prediction method based on big data processing framework
CN118535340A (en) Deep learning model optimization method for large-scale distributed training
US20250199933A1 (en) Task scheduling method, model generation method, and electronic device
CN117150237A (en) Time sequence data prediction method, device, equipment and computer readable storage medium
CN116108945A (en) Energy consumption prediction method, device, equipment and storage medium
CN119151067B (en) System performance prediction model training method, system performance prediction method and device
CN119784005A (en) Intelligent work order management method, system, device and medium
CN118656646A (en) Automated large model fusion processing method, device, storage medium and electronic device
CN116757650B (en) Project management and resource scheduling method based on machine learning
CN114520773B (en) Service request response method, device, server and storage medium
CN112667398B (en) Resource scheduling method and device, electronic equipment and storage medium
CN113704687B (en) Tensor calculation operation method, device and operation system
CN105760285B (en) A kind of typical embedding assembly machine architecture efficiency evaluation method
CN111427935B (en) Predicting and displaying method for quantized transaction index, electronic equipment and medium
CN119397199B (en) NFC chip intelligent data loading method and system
US12112164B2 (en) Machine code instruction
CN119520526B (en) Mail storage node selection method, device, equipment and storage medium
US12100175B2 (en) System and method of detecting at least one object depicted in an image
US20250181894A1 (en) Adaptive precision for deep neural network models
CN115222082B (en) Method, device and computer equipment for predicting the number of parcels to be delivered during a shift

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载