CN109783224B

CN109783224B - Task allocation method and device based on load allocation and terminal equipment

Info

Publication number: CN109783224B
Application number: CN201811502296.1A
Authority: CN
Inventors: 王路生; 陆进; 陈斌; 宋晨
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-12-10
Filing date: 2018-12-10
Publication date: 2022-10-14
Anticipated expiration: 2038-12-10
Also published as: CN109783224A

Abstract

The invention is suitable for the technical field of data processing, and provides a task allocation method, a device, terminal equipment and a computer readable storage medium based on load allocation, wherein the method comprises the following steps: binding at least two computing nodes, and carrying out computing power statistics on each bound computing node to obtain an average computing power value of each computing node; if the task type of the calculation task is the timing task, acquiring a task force calculation value of the calculation task, analyzing a calculation node meeting the task force calculation value according to the average force calculation value, determining the analyzed calculation node as a target node, and distributing the calculation task to all target nodes; and if the task type is the maximum computing power task, determining each bound computing node as a target node, and distributing the computing tasks to all the target nodes. According to the invention, the task allocation is carried out according to the average calculation force value of the calculation node and the specific task type of the calculation task, so that the flexibility of the task allocation is improved, and meanwhile, the processing efficiency of the task is also improved.

Description

Task allocation method and device based on load allocation and terminal equipment

Technical Field

The present invention belongs to the technical field of data processing, and in particular, to a method and an apparatus for task allocation based on load scheduling, a terminal device, and a computer-readable storage medium.

Background

With the development of mathematics and computer technology, deep learning has become the current focus of research. Deep learning is a branch field of machine learning, and data such as images, sounds or texts are interpreted by simulating a mechanism of a human brain by establishing and simulating a neural network for analyzing and learning of the human brain and sending a deep learning task to the neural network for processing.

In the existing deep learning framework, the tasks of deep learning are generally evenly distributed to each computing node in a processing unit (such as a central processing unit) for processing, and since the processing capacities of the computing nodes in the processing unit may be unequal, and some computing nodes may have a risk of crashing, the task processing efficiency is low. Therefore, in the prior art, the task allocation mode is rigid, and the task processing efficiency is low.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and an apparatus for task allocation based on load allocation, a terminal device, and a computer-readable storage medium, so as to solve the problems in the prior art that task allocation is not flexible and task processing efficiency is low.

A first aspect of an embodiment of the present invention provides a task allocation method based on load allocation, including:

binding at least two computing nodes, and carrying out computing power statistics on each bound computing node to obtain an average computing power value of each computing node, wherein the computing nodes are central processor cores, graph processor cores or neural network processor cores;

if the task type of the calculation task is a timing task, acquiring a task force value of the calculation task, analyzing the calculation node meeting the task force value according to the average force value, determining the analyzed calculation node as a target node, and distributing the calculation task to all the target nodes, wherein the calculated amount distributed to the target nodes corresponds to the average force value of the target nodes;

and if the task type is the maximum calculation power task, determining each bound computing node as the target node, and distributing the computing task to all the target nodes.

A second aspect of the embodiments of the present invention provides a task allocation apparatus based on load scheduling, including:

the calculation force counting unit is used for binding at least two calculation nodes and carrying out calculation force counting on each bound calculation node to obtain an average calculation force value of each calculation node, wherein the calculation nodes are a central processing unit core, a graph processor core or a neural network processor core;

the first allocation unit is used for acquiring a task force calculation value of the calculation task if the task type of the calculation task is a timing task, analyzing the calculation nodes meeting the task force calculation value according to the average force calculation value, determining the analyzed calculation nodes as target nodes, and allocating the calculation task to all the target nodes, wherein the calculation amount allocated to the target nodes corresponds to the average force calculation value of the target nodes;

and the second distribution unit is used for determining each bound computing node as the target node and distributing the computing task to all the target nodes if the task type is the maximum computing power task.

A third aspect of the embodiments of the present invention provides a terminal device, where the terminal device includes a memory, a processor, and a computer program that is stored in the memory and is executable on the processor, and the processor implements the following steps when executing the computer program:

if the task type of the calculation task is a timing task, acquiring a task force calculation value of the calculation task, analyzing the calculation nodes meeting the task force calculation value according to the average force calculation value, determining the analyzed calculation nodes as target nodes, and distributing the calculation task to all the target nodes, wherein the calculation amount distributed to the target nodes corresponds to the average force calculation value of the target nodes;

A fourth aspect of embodiments of the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of:

Compared with the prior art, the embodiment of the invention has the following beneficial effects:

the method comprises the steps that computing nodes supporting task processing are bound, the average calculation force value of each computing node is calculated, if the task type of the obtained computing task is a timing task, the computing nodes meeting the task calculation force value of the computing task are analyzed according to the average calculation force value, and the computing task is distributed to the analyzed computing nodes; and if the task type is the maximum computation task, distributing the computation task to all the bound computation nodes for processing. According to the embodiment of the invention, the average calculation force value of each calculation node is calculated, and the task is allocated according to the specific task type of the calculation task, so that the flexibility of task allocation is improved, and the processing efficiency of the task is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a flowchart illustrating an implementation of a task allocation method based on load scheduling according to an embodiment of the present invention;

fig. 2 is a flowchart of an implementation of a task allocation method based on load scheduling according to a second embodiment of the present invention;

fig. 3 is a flowchart of an implementation of a task allocation method based on load scheduling according to a third embodiment of the present invention;

fig. 4 is a flowchart of an implementation of a task allocation method based on load scheduling according to a fourth embodiment of the present invention;

fig. 5 is a flowchart of an implementation of a task allocation method based on load scheduling according to a fifth embodiment of the present invention;

FIG. 6 is a block diagram of a task allocation apparatus based on load scheduling according to a sixth embodiment of the present invention;

fig. 7 is a schematic diagram of a terminal device according to a seventh embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

Fig. 1 shows an implementation flow of the task allocation method based on load scheduling according to the embodiment of the present invention, which is detailed as follows:

in S101, at least two computing nodes are bound, computing power statistics is carried out on each bound computing node to obtain an average computing power value of each computing node, and the computing nodes are central processing unit cores, graph processor cores or neural network processor cores.

In the embodiment of the present invention, first, all computing nodes in the terminal device are determined according to an actual configuration situation of the terminal device, and at least two computing nodes in the terminal device are bound, where a computing node is a minimum computing Unit having a task Processing capability, for example, the computing node may be a Central Processing Unit (CPU) core, a Graphics Processing Unit (GPU) core, or a Neural Network Processing Unit (NPU) core, and besides, the computing node may also be a Digital Signal Processing (DSP) Unit or a hardware acceleration Unit, and the like. Binding the computing nodes refers to using the computing nodes as candidate nodes for processing computing tasks, and different binding modes exist in the embodiment of the invention according to the actual configuration condition of the terminal equipment and different computing tasks. For example, after all the computing nodes on the terminal device are determined, all the computing nodes can be directly bound; if the terminal equipment is in CPU + GPU heterogeneous configuration and the computing task is specified to be processed only by the CPU, only the computing task is bound as a computing node of a CPU core; and if the terminal equipment is in a CPU + GPU heterogeneous configuration and the computing task is specified to be processed only by the GPU, only the computing node bound as a GPU core is bound. The embodiment of the present invention does not limit the specific manner of binding the computing nodes, for example, a preset binding parameter can be set for the computing node to be bound in the configuration file, and the processing subject of the computing task is limited to the computing node corresponding to the binding parameter.

And carrying out calculation force statistics on each bound calculation node so as to obtain an average calculation force value of each calculation node, wherein the average calculation force value indicates the processing capacity of the calculation node for the calculation task. For convenience of explanation, the unit of the calculation force value (average calculation force value) is the number of Operations Per Second (OPS) in the embodiment of the present invention, but this does not limit the embodiment of the present invention. When calculating force statistics is carried out, a processor to which a calculating node belongs physically is determined, for example, a processor to which a central processing unit core belongs physically is a central processing unit, then an average calculating force value of the calculating node under the processor is obtained by obtaining the calculating force value of the processor, specifically, one mode is to directly read configuration parameters of the processor to which the calculating node belongs, and obtain the position from the configuration parametersPerforming division operation on the calculation force value of the processor and the number of the calculation nodes contained in the processor to obtain an average calculation force value of the calculation nodes; or the task with known calculated amount is delivered to the processor to which the calculation node belongs for processing, the calculation force value of the processor is calculated according to the duration of the processor processing the task, and then the calculation force value and the number of the calculation nodes contained in the processor are divided to obtain the average calculation force value of the calculation node. For the convenience of understanding, in the embodiment of the present invention, the number of operations is used as a unit of the calculated amount, and the calculated amount may also be represented in other forms in an actual application scenario. For example, the computation amount of a task is 100G (giga, i.e. billion) operations, and the computation Node _A If the time length for the processor to complete the task is 2 seconds, the calculation force value of the processor can be calculated to be 100/2=50GOPS, and if the processor comprises 2 calculation nodes, the calculation Node can be calculated _A The average calculated force value of (2) =25gops.

Optionally, an computation force inventory table is established based on all the average computation force values, and a mapping relationship between the average computation force values and the computation nodes is configured in the computation force inventory table. In the embodiment of the invention, after the average calculation force value corresponding to each bound calculation node is calculated, a calculation force preparation table can be established based on all the average calculation force values, and the mapping relation between the average calculation force value and the calculation node is established in the calculation force preparation table, so that the subsequent task distribution is facilitated, the calculation force preparation table can be a database table or a table in other forms, and the convenience of the subsequent task distribution can be improved by the method.

In S102, if the task type of the computation task is a timing task, obtaining a task force value of the computation task, analyzing the computation nodes satisfying the task force value according to the average force value, determining the analyzed computation nodes as target nodes, and allocating the computation task to all the target nodes, where the computation amount allocated to the target nodes corresponds to the average force value of the target nodes.

In the embodiment of the present invention, for a to-be-processed computation task, task allocation is performed according to a task type of the computation task, where the computation task may be used to implement functions such as deep learning, monitoring, or retrieval, and the like, which is not limited in the embodiment of the present invention. Specifically, if the task type of the computing task is a timing task, that is, the computing task has a fixed aging requirement, acquiring a task computation force value required by the computing task, analyzing a computing node meeting the task computation force value according to the average computation force value, for convenience of distinguishing, naming the analyzed computing node as a target node, and distributing the computing task to all target nodes, wherein the sum of the average computation force values of the target nodes is greater than or equal to the task computation force value, and in addition, the task type and the task computation force value of the computing task can be pre-specified by a user. For example, the real-time video processing task needs to complete video processing when each frame of a video ends, and the requirement on the timeliness is fixed, so that the task type of the real-time video processing task can be designated as a timing task.

In the process of distributing the calculation tasks to the target nodes, the calculation amount distributed to each target node is also analyzed. Firstly, determining a target execution duration according to the aging requirement of the calculation task, and then determining the calculation total amount of the calculation task according to the target execution duration and the task calculation force value, for example, if the task calculation force value of the calculation task is 100GOPS, and the target execution duration (aging requirement) is 2 seconds, it can be determined that the calculation total amount of the calculation task is 100 × 2=200g operations. If the number of the target nodes is only one, distributing the total calculation amount to the target nodes; if the number of the target nodes is at least two, splitting the total calculation amount into at least two calculation amounts according to the average calculation force value of each target node, and distributing the calculation amounts to each target node respectively. When the number of the target nodes is at least two, performing product operation on the average calculation force value of each target node and the target execution duration of the calculation task to obtain the maximum support amount of each target node, and performing calculation amount distribution according to the maximum support amount and the total calculation amount, wherein two calculation amount distribution modes specifically exist: the first is a priority distribution mode, that is, the calculated amount is distributed to the target node in the front row preferentially according to the numerical order of the average calculated force valueThe distributed calculated amount is the same as the maximum support amount of the target Node until the total calculated amount is distributed, wherein the numerical sequence of the average calculated force value can be the sequence from the large average calculated force value to the small average calculated force value or the sequence from the small average calculated force value to the large average calculated force value, for example, the total calculated amount of the calculation task is 200G times of operation, the target execution time is 2 seconds, and the target Node comprises a target Node _B And target Node _C ，Node _B The average calculated force value of 70GOPS, node _C If the average calculated force value is 60GOPS, the Node can be calculated _B And Node _C The maximum support amounts of the calculation results are respectively 140G times of calculation and 120G times of calculation, and if the numerical sequence of the average calculation force values is the sequence of the average calculation force values from large to small, the Node is preferentially selected _B Allocating the calculated amount equal to the maximum supported amount, i.e. 140G operations, and then Node _C Distributing the calculated amount of the remaining 60G operations; the other mode is a balanced distribution mode, namely, the calculated amount is distributed to the target Node according to the proportion of the average calculated force value of the target Node, and the Node is used for calculating the calculated amount _B And Node _C Example of (2) Node _B And Node _C The ratio of the average calculated force value of (2) is 7:6, determining the distribution to the Node _B The calculated amount of (2) is 200 x (7/(7+6)) ≈ 108G times of operation, and the calculation amount distributed to Node is determined _C The calculated amount of (2) is 200 (6/(7+6)) ≈ 92G operations, and of course, if the calculated amount allocated to a certain target node exceeds the maximum support amount of the target node, the calculated amount exceeding the maximum support amount may be allocated to other target nodes again for further equalization, preventing the calculation task processing from being overdue. The former calculation amount distribution mode is higher in efficiency in distribution, the latter calculation amount distribution mode realizes load balancing, reduces loss caused by a fault of a certain target node, and can be applied according to actual application scenarios.

Optionally, sorting all the bound computing nodes according to the numerical sequence of the average computing force value to generate a task allocation sequence; analyzing the minimum computing node meeting the task computing force value according to the task allocation sequence, determining the analyzed computing node as a target node, and allocating the computing taskTo the target node. In the embodiment of the invention, as the task type of the computing task is a timing task, in order to reduce the number of target nodes participating in the computing as much as possible and reduce resource consumption, all the bound computing nodes can be sorted according to the numerical sequence of the average computing force value to generate a task distribution sequence, wherein the numerical sequence of the average computing force value can be the sequence of the average computing force value from large to small or the sequence of the average computing force value from small to large. And then, analyzing the minimum computing node meeting the task computing force value according to the generated task distribution sequence, determining the analyzed computing node as a target node, and distributing the computing task to the target node. For example, all the computing nodes bound include the computing Node _D 、Node _E And Node _F The average computation force values are respectively 50GOPS, 60GOPS and 70GOPS, and if the numerical sequence of the average computation force values is the sequence of the average computation force values from large to small, the generated task distribution sequence is Node _F -Node _E -Node _D If the task force value of the calculation task is 100GOPS, when the target Node is determined according to the task distribution sequence, firstly, the Node at the head of the sequence is determined _F Determining as the target Node, and determining the Node again because the computation value of 100-70=30GOPS remains _E And determining the target node, and finishing the distribution of the task calculation value and the determination of the target node. By the method, the number of the target nodes is reduced as much as possible while the task calculation value is met, so that the consumption of calculation resources is reduced.

In S103, if the task type is the maximum computation task, determining each bound computing node as the target node, and allocating the computing task to all the target nodes.

If the task type of the computing task is a maximum computing power task (that is, the processing speed of the computing task is as high as possible, and no fixed aging requirement exists), such as a search task or a picture retrieval task, and the like, determining each bound computing node as a target node, and distributing the computing task to all the target nodes to accelerate the processing speed. It should be noted that, when the task type of the computation task is the maximum computation task, the average computation amount may be computed based on the total computation amount of the computation task and the number of target nodes, and then the average computation amount is allocated to each target node, for example, if the total computation amount of the computation task is 200G operations and the number of target nodes is 10, the average computation amount may be computed as 200/10=20g operations, and the average computation amount is allocated to each target node, so as to implement load balancing. It should be noted that, in a case that the calculation task is obtained from an external system, in order to facilitate identification of the task type, a mapping relationship may be established between the task type of the timing task and a first preset identifier, and a mapping relationship may be established between the task type of the maximum calculation task and a second preset identifier, the external system selects, according to specific content of the calculation task, whether the first preset identifier or the second preset identifier is added to the task name of the calculation task, and after the terminal device obtains the calculation task, the identifier in the calculation task is identified, so that the task type corresponding to the calculation task is identified, and if the task name of the calculation task contains the first preset identifier, the task type of the calculation task is determined to be the timing task.

Preferably, a bound and free computing node is determined as the target node. In order to prevent the compute node from being overloaded, when determining the target node, the utilization rate of each bound compute node may be counted, and a compute node in which the utilization rate is idle, i.e., the utilization rate of the compute node is lower than a preset utilization rate threshold (e.g., 10%), may be determined as the target node. The utilization rate of the computing node may be obtained by executing a specific query instruction in the operating system of the terminal device, for example, if the operating system of the terminal device is a Linux system, the utilization rate of the computing node may be obtained by executing a query instruction such as a top instruction.

As can be seen from the embodiment shown in fig. 1, in the embodiment of the present invention, at least two computing nodes are bound, and an average computation force value of each computing node is computed, if the task type of the computing task is a timing task, a task computation force value of the computing task is obtained, the computing nodes satisfying the task computation force value are determined as target nodes, and the computing task is distributed to the target nodes; and if the task type is the maximum computing power task, distributing the computing task to all the bound computing nodes. According to the embodiment of the invention, each computable unit in the terminal equipment is used as a computing node, and the computing task is distributed to the corresponding computing node according to the task type of the computing task, so that the flexibility of task distribution and the processing efficiency of the computing task are improved.

Fig. 2 shows a method obtained by performing extension based on the first embodiment of the present invention. An embodiment of the present invention provides an implementation flowchart of a task allocation method based on load allocation, and as shown in fig. 2, the task allocation method may include the following steps:

in S201, the calculation force monitoring is performed on the target node to which the calculation task has been allocated, and a real-time calculation force value of the target node is obtained.

After the calculation task is distributed, calculation force monitoring is carried out on the target node of the distributed calculation task, and therefore the current real-time calculation force value of the target node is obtained, wherein calculation force monitoring modes are different according to different processors to which the target node belongs, for example, if the target node is a central processor core, namely the processor to which the target node belongs is a central processor, calculation force monitoring can be carried out on monitoring software such as a CPU-Z based on an open source.

In S202, if the real-time computation force value is smaller than the average computation force value of the target node, and an absolute value of a difference between the real-time computation force value and the average computation force value exceeds a preset fluctuation value, determining the target node as a node to be evaluated, reducing the computation amount allocated to the node to be evaluated according to the real-time computation force value, and allocating redundant computation amount to other idle target nodes.

If the monitored real-time computing force value is smaller than the average computing force value of the target node and the absolute value of the difference between the real-time computing force value and the average computing force value exceeds a preset fluctuation value, determining the target node as a node to be evaluated, calculating the proportion between the real-time computing force value and the average computing force value, performing product operation on the proportion and the computing quantity initially distributed to the node to be evaluated, and taking the result of the product operation as the computing quantity redistributed to the node to be evaluated. The fluctuation value can be set according to the fluctuation range of the target node in the actual application scenario, for example, it can be set to 30GOPS. As for the redundant computation amount, the redundant computation amount can be allocated to other idle target nodes, and similarly, an idle target node refers to a target node whose utilization rate is less than the utilization rate threshold. It should be noted that if the real-time force value has other conditions, such as the real-time force value is not less than the average force value, or the absolute value of the difference between the real-time force value and the average force value does not exceed the fluctuation value, the operation of updating the calculated amount is not executed, the force monitoring is continued to the target node until the calculation task on the target node is executed,

for example, if the fluctuation value is 30GOPS, the calculation amount initially allocated to the target node is 300G operations, the average calculation force value of the target node is 60GOPS, the real-time calculation force value obtained by monitoring the target node is 20GOPS, and the target node is determined as the node to be evaluated because the real-time calculation force value is smaller than the average calculation force value, the absolute value of the difference between the real-time calculation force value and the average calculation force value is 40GOPS, and exceeds the fluctuation value of 30GOPS. Then, calculating the ratio between the real-time force value and the average force value as 1: and 3, further calculating the calculation amount newly allocated to the node to be evaluated as 300 × 1/3) =100G operations, and allocating the redundant 200G operations to other idle target nodes.

Optionally, after the calculation amount redistributed to the node to be evaluated is calculated according to the ratio between the real-time calculation value and the average calculation value, the calculation amount redistributed to the node to be evaluated is updated according to the preset margin amount. Under the condition that the calculation value of the node to be evaluated is reduced to be out of the normal fluctuation range, the calculation value of the node to be evaluated may continue to be reduced, so in order to not delay the processing of the calculation task, the margin amount may be preset, after the calculation amount redistributed to the node to be evaluated is calculated according to the proportion between the real-time calculation value and the average calculation value, the calculation amount and the margin amount are subjected to subtraction operation, and the obtained result is used as the final calculation amount redistributed to the node to be evaluated. For example, assuming that the margin amount is 10G operations, the computation amount reallocated to the node to be evaluated is calculated to be 30G operations according to the ratio between the real-time computation value and the average computation value of the node to be evaluated, the computation amount is subtracted from the margin amount to obtain the final computation amount of 30-10=20g operations, and the computation amount of 20G operations is reallocated to the node to be evaluated.

As can be seen from the embodiment shown in fig. 2, in the embodiment of the present invention, the real-time computation value of the target node is obtained by monitoring the computation force of the target node to which the computation task has been allocated, and if the obtained real-time computation value is smaller than the average computation value of the target node and the absolute value of the difference between the real-time computation value and the average computation value exceeds the preset fluctuation value, the target node is determined as the node to be evaluated, the computation amount allocated to the node to be evaluated is reduced according to the real-time computation value, and the redundant computation amount is allocated to other idle target nodes.

Fig. 3 is a method obtained by refining a process of determining a target node as a node to be evaluated, reducing a computation amount allocated to the node to be evaluated according to a real-time computation value, and allocating an excess computation amount to other idle target nodes, based on the second embodiment of the present invention. An embodiment of the present invention provides an implementation flowchart of a task allocation method based on load allocation, and as shown in fig. 3, the task allocation method may include the following steps:

in S301, the processor where the node to be evaluated is located is determined as the processor to be evaluated.

Because the decrease of the computation value of the node to be evaluated may be caused by processor failure or processor frequency reduction, the processor where the node to be evaluated is located is determined as the processor to be evaluated in terms of the determined target node not limited to the scenario of one processor.

In S302, the calculated amounts allocated to all the target nodes in the processor to be evaluated are uniformly reduced according to the real-time calculated amount value, and redundant calculated amounts are allocated to other idle target nodes not belonging to the processor to be evaluated.

And under the condition that the processor to be evaluated is determined, calculating the proportion between the real-time force value and the average force value of the node to be evaluated, carrying out product operation on each target node under the processor to be evaluated and the calculation amount initially distributed to the target node according to the proportion, and taking the result of the product operation as the calculation amount redistributed to the target node. And allocating the redundant calculated amount to other idle target nodes which do not belong to the processor to be evaluated. It is worth mentioning that if at least two nodes to be evaluated exist under a certain processor to be evaluated, the proportion between the real-time computation force value and the average computation force value corresponding to each node to be evaluated is respectively calculated, and the computation amount redistributed to the target node is calculated according to the proportion with the lowest value.

For example, the target Node under the processor to be evaluated includes Node _G 、Node _H And Node _I Wherein Node _G Confirmed as the Node to be evaluated, and is initially allocated to the Node _G 、Node _H And Node _I The calculated amount of (A) is respectively 50G times of operation, 100G times of operation and 150G times of operation, node _G The ratio of the real-time force value to the average force value is 1:5, the reassignment to Node can be calculated _G 、Node _H And Node _I The calculated amount of the operation is 10G times of operation, 20G times of operation and 30G times of operation respectively, and the calculated amount of the redundant 240G times of operation is distributed to other target nodes which do not belong to the processor to be evaluated and are idle.

As can be seen from the embodiment shown in fig. 3, in the embodiment of the present invention, the processor in which the node to be evaluated is located is determined as the processor to be evaluated, the computation amounts allocated to all target nodes in the processor to be evaluated are uniformly reduced according to the real-time computation value, and the redundant computation amounts are allocated to other idle target nodes that do not belong to the processor to be evaluated.

Fig. 4 is a diagram illustrating a method obtained by expanding a process of reducing the computation amount allocated to the node to be evaluated according to the real-time computation value and allocating the redundant computation amount to other idle target nodes based on the second embodiment of the present invention. An embodiment of the present invention provides an implementation flowchart of a task allocation method based on load allocation, and as shown in fig. 4, the task allocation method may include the following steps:

in S401, at least two real-time computation force values of the node to be evaluated are obtained in a preset monitoring time period, and a computation force variation trend value of the node to be evaluated is analyzed according to all the obtained real-time computation force values.

In the embodiment of the present invention, a target node determined as a node to be evaluated may also be continuously monitored, and specifically, at least two real-time computation values of the node to be evaluated are obtained according to a preset monitoring time period and a preset monitoring frequency, where the monitoring time period and the monitoring frequency may be determined according to a target execution duration of a computation task in an actual application scenario, preferably, a set duration of the monitoring time period is smaller than the target execution duration, for example, when the target execution duration is 10 minutes, the monitoring time period is set to 5 minutes after the node to be evaluated is confirmed, and the monitoring frequency is set to monitor once every 5 seconds. And for all the obtained real-time force values, taking the monitoring time of the real-time force values as a horizontal axis, taking the real-time force values as a vertical axis, establishing a force value coordinate system, sequentially calculating the slope of a straight line formed by every two separated real-time force values according to the sequence of the monitoring time from front to back, and finally taking the sum of all the obtained slopes as a force variation trend value. For example, the three real-time calculation force values obtained according to the monitoring frequency of 5 seconds are respectively Value _A 、Value _B And Value _C ，Value _A 、Value _B And Value _C The values of (A) are respectively 50GOPS, 25GOPS and 75GOPS _A 、Value _B And Value _C The monitoring time of (2) is respectively the 5 th second, the 10 th second and the 15 th second, then the Value can be calculated _A And Value _B The slope of the formed straight line is (25-50)/(10-5) = -5 _B And Value _C The slope of the formed straight line is (75-25)/(15-10) =10, and the final calculated force variation trend value is-5 +10=5.

In S402, if the calculation force variation trend value is less than zero and the newly obtained real-time calculation force value is less than a preset valley value, the node to be evaluated is unbound, and the calculation amount currently allocated to the node to be evaluated is reallocated to other idle target nodes.

And if the calculation force variation trend value is smaller than zero and the newly acquired real-time calculation force value is smaller than the preset valley value, the node to be evaluated corresponding to the real-time calculation force value is proved to be incapable of supporting normal processing of the calculation task, the node to be evaluated is unbound, and the calculation amount currently allocated to the node to be evaluated is reallocated to other idle target nodes. The valley value can be set according to the actual application scenario, for example, set to 10GOPS. In addition, if the calculation force variation trend value is less than zero and the newly acquired real-time calculation force value is greater than or equal to the valley value, no operation is performed and calculation force monitoring is continued due to the possibility of subsequent rise of the calculation force value of the node to be evaluated. And if all the acquired real-time calculation values are greater than or equal to the valley values and the calculation force variation trend value is always smaller than zero when the monitoring time period is over, and the fact that the calculation force value of the node to be evaluated is abnormal for a long time is proved, performing unbinding operation on the node to be evaluated similarly when the monitoring time period is over, and reallocating the calculation amount currently distributed to the node to be evaluated to other idle target nodes.

In S403, if the calculation force variation trend value is greater than or equal to zero and the newly obtained real-time calculation force value is greater than or equal to the average calculation force value, reassigning the calculation amount initially assigned to the node to be evaluated.

And if the calculation force variation trend value is greater than or equal to zero and the newly acquired real-time calculation force value is greater than or equal to the average calculation force value, determining that the calculation force value of the node to be evaluated is restored to the level of the average calculation force value, and reallocating the calculation amount originally distributed to the node to be evaluated.

As can be seen from the embodiment shown in fig. 4, in the embodiment of the present invention, the calculation force is monitored in the preset monitoring time period, the calculation force change trend value of the node to be evaluated is analyzed, if the calculation force change trend value is smaller than zero and the newly obtained real-time calculation force value is smaller than the preset valley value, the node to be evaluated is unbound, and the calculation amount currently allocated to the node to be evaluated is reallocated to other idle target nodes; and if the calculation force variation trend value is greater than or equal to zero and the newly acquired real-time calculation force value is greater than or equal to the average calculation force value, reallocating the calculation amount initially distributed to the node to be evaluated. According to the embodiment of the invention, the calculation capacity of the node to be evaluated is monitored in the monitoring time period, and different operations such as unbinding or calculation capacity replying are executed according to the monitoring result, so that the calculation capacity distribution is more fit with the calculation capacity value change condition of the node to be evaluated, and the applicability of the calculation capacity distribution is improved.

Fig. 5 is a method obtained by refining a process of performing computation statistics on each bound computing node to obtain an average computation value of each computing node, based on the first embodiment of the present invention. An embodiment of the present invention provides an implementation flowchart of a task allocation method based on load allocation, and as shown in fig. 5, the task allocation method may include the following steps:

in S501, the computing nodes bound and located in the same processor are classified into a node group, where the node group includes at least one computing node.

Since the read configuration parameters may not be accurate due to processor aging or different specific configurations of the processor, and the computation force values of each compute node within the processor are not necessarily equal, embodiments of the present invention provide for reading configuration parameters that are inaccurate due to processor aging or different specific configurations of the processorAnd calculating force statistics is carried out on the computing nodes according to the actual task execution conditions of the computing nodes. Firstly, the bound computing nodes located in the same processor are classified into a node group, wherein one node group includes at least one computing node, and it should be understood that the node group only indicates that the computing nodes located in the same processor are individually classified, and does not refer to a specific storage format. For example, the bound computing Node _J And Node _K For central processor core, bound computing Node _L And Node _M For the graphic processor core, node will be _J And Node _K Classifying into a Node group _L And Node _M And belongs to another node group.

In S502, a preset statistical task is fragmented according to the number of the computing nodes in the node group, and the fragmented statistical task is distributed to each of the computing nodes in the node group for processing.

After at least one node group is obtained, for each node group, fragmenting a preset statistical task according to the number of the computing nodes in the node group, and distributing the fragmented statistical task to each computing node in the node group for processing. For example, the computing nodes in a Node group include nodes _N And Node _O And the total calculation amount of the statistical task is 100G times of operation, the total calculation amount is divided into two, and the calculation amount of 50G times of operation is delivered to the Node _N Processing is carried out, and the calculated amount of the other 50G times of calculation is delivered to the Node _O Carrying out treatment; for example, the computing nodes in a Node group include nodes _P 、Node _Q And Node _R And the total calculation amount of the statistical task is 15 Gigabytes (GB), the total calculation amount is divided into three equal parts, and 5GB data is delivered to the Node _P Processing, delivering another 5GB data to Node _Q To carry outProcessing, delivering the residual 5GB data to the Node _R And (6) processing. Preferably, the preset statistical task and the calculation task have the same function, namely belong to the same task, so that the method is convenient to adapt to the scene of subsequently executing the calculation task.

In S503, the processing duration of each computing node is obtained, and the average computation force value is calculated according to the processing duration and the computation amount of the fragmented statistical task, where the processing duration is the duration of the statistical task after the computing node completes the fragmentation.

After the calculated amount of the statistical task is distributed to each calculation node in the node group, the processing time of each calculation node is obtained, and the average calculated value of the calculation node is obtained according to the processing time and the calculated amount, wherein the processing time is the time of the statistical task after the calculation node completes the fragmentation. For example, if the processing time of a certain computing node is 2 seconds and the processing amount is 100G operations, the average computation force value of the computing node can be calculated to be 50GOPS.

As can be seen from the embodiment shown in fig. 5, in the embodiment of the present invention, computing nodes that are bound and located in the same processor are classified into a node group, a preset statistical task is fragmented according to the number of the computing nodes in the node group, the fragmented statistical task is respectively distributed to each computing node in the node group for processing, then the processing time of each computing node is obtained, and an average computation value is calculated according to the processing time and the computation amount of the fragmented statistical task. According to the embodiment of the invention, the average calculation force value is obtained by enabling the calculation node to process the statistical task, so that the accuracy of the average calculation force value is improved, and accurate calculation amount distribution is conveniently carried out according to the average calculation force value.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

Fig. 6 is a block diagram illustrating a structure of a task allocation apparatus based on load scheduling according to an embodiment of the present invention, and referring to fig. 6, the task allocation apparatus includes:

the calculation force counting unit 61 is used for binding at least two calculation nodes and carrying out calculation force counting on each bound calculation node to obtain an average calculation force value of each calculation node, wherein the calculation nodes are central processor cores, graph processor cores or neural network processor cores;

a first allocating unit 62, configured to obtain a task force value of the computing task if the task type of the computing task is a timing task, analyze the computing node that meets the task force value according to the average force value, determine the analyzed computing node as a target node, and allocate the computing task to all the target nodes, where a computation amount allocated to the target node corresponds to the average force value of the target node;

a second allocating unit 63, configured to determine each bound computing node as the target node if the task type is the maximum computation power task, and allocate the computing task to all the target nodes.

Optionally, the task allocation device further includes:

the monitoring unit is used for carrying out calculation force monitoring on the target node which is distributed with the calculation task to obtain a real-time calculation force value of the target node;

and the evaluation unit is used for determining the target node as the node to be evaluated if the real-time computing force value is smaller than the average computing force value of the target node and the absolute value of the difference value between the real-time computing force value and the average computing force value exceeds a preset fluctuation value, reducing the computing amount distributed to the node to be evaluated according to the real-time computing force value, and distributing redundant computing amount to other idle target nodes.

Optionally, the evaluation unit comprises:

the processor evaluation unit is used for determining the processor where the node to be evaluated is positioned as the processor to be evaluated;

and the reducing unit is used for uniformly reducing the calculated amount distributed to all the target nodes in the processor to be evaluated according to the real-time calculated force value, and distributing redundant calculated amount to other idle target nodes which do not belong to the processor to be evaluated.

Optionally, the evaluation unit further comprises:

the analysis unit is used for acquiring at least two real-time calculation force values of the node to be evaluated in a preset monitoring time period and analyzing a calculation force change trend value of the node to be evaluated according to all the acquired real-time calculation force values;

the unbinding unit is used for unbinding the node to be evaluated and reallocating the calculated amount currently allocated to the node to be evaluated to other idle target nodes if the calculated force variation trend value is smaller than zero and the newly acquired real-time calculated force value is smaller than a preset valley value;

and the redistribution unit is used for redistributing the calculated amount which is originally distributed to the node to be evaluated if the calculated force variation trend value is greater than or equal to zero and the newly obtained real-time calculated force value is greater than or equal to the average calculated force value.

Optionally, the calculation force statistic unit 61 includes:

the classification unit is used for classifying the bound computing nodes which are positioned on the same processor into a node group, wherein the node group comprises at least one computing node;

the fragmentation unit is used for fragmenting a preset statistical task according to the number of the computing nodes in the node group and distributing the fragmented statistical task to each computing node in the node group for processing;

and the computing unit is used for acquiring the processing time of each computing node and computing the average computation force value according to the processing time and the computation amount of the fragmented statistical task, wherein the processing time is the time of the statistical task processed and fragmented by the computing nodes.

Optionally, the first distribution unit 62 comprises:

the sequencing unit is used for sequencing all the bound computing nodes according to the numerical sequence of the average computing force value to generate a task allocation sequence;

and the first allocating subunit is used for analyzing the minimum computing node meeting the task computing force value according to the task allocating sequence, determining the analyzed computing node as the target node, and allocating the computing task to the target node.

Therefore, the task allocation device based on load allocation provided by the embodiment of the invention allocates tasks according to the average calculation force value and the task type of the calculation task, and the flexibility of task allocation and the efficiency of task processing are improved.

Fig. 7 is a schematic diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 7, the terminal device 7 of this embodiment includes: a processor 70, a memory 71 and a computer program 72 stored in said memory 71 and operable on said processor 70, such as a load-scheduling based task allocation program. The processor 70, when executing the computer program 72, implements the steps of the above-mentioned various load-scheduling-based task allocation method embodiments, such as the steps S101 to S103 shown in fig. 1. Alternatively, the processor 70, when executing the computer program 72, implements the functions of the units in the above-described embodiments of the task assigning apparatus, such as the functions of the units 61 to 63 shown in fig. 6.

Illustratively, the computer program 72 may be divided into one or more units, which are stored in the memory 71 and executed by the processor 70 to accomplish the present invention. The one or more units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 72 in the terminal device 7. For example, the computer program 72 may be divided into a computation force statistics unit, a first distribution unit and a second distribution unit, each unit having the following specific functions:

the calculation force counting unit is used for binding at least two calculation nodes and carrying out calculation force statistics on each bound calculation node to obtain an average calculation force value of each calculation node, wherein the calculation nodes are central processor cores, graph processor cores or neural network processor cores;

the first distribution unit is used for acquiring a task force value of the computing task if the task type of the computing task is a timing task, analyzing the computing nodes meeting the task force value according to the average force value, determining the analyzed computing nodes as target nodes, and distributing the computing task to all the target nodes, wherein the calculated amount distributed to the target nodes corresponds to the average force value of the target nodes;

and the second distributing unit is used for determining each bound computing node as the target node and distributing the computing task to all the target nodes if the task type is the maximum computing power task.

The terminal device 7 may be a computing device such as a desktop computer, a notebook, a palm computer, and a cloud server. The terminal device may include, but is not limited to, a processor 70, a memory 71. It will be appreciated by those skilled in the art that fig. 7 is merely an example of a terminal device 7 and does not constitute a limitation of the terminal device 7 and may comprise more or less components than shown, or some components may be combined, or different components, for example the terminal device may further comprise input output devices, network access devices, buses, etc.

The Processor 70 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 71 may be an internal storage unit of the terminal device 7, such as a hard disk or a memory of the terminal device 7. The memory 71 may also be an external storage device of the terminal device 7, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 7. Further, the memory 71 may also include both an internal storage unit and an external storage device of the terminal device 7. The memory 71 is used for storing the computer program and other programs and data required by the terminal device. The memory 71 may also be used to temporarily store data that has been output or is to be output.

It is obvious to those skilled in the art that, for convenience and simplicity of description, the above division of each functional unit is only used for illustration, and in practical applications, the above function distribution may be performed by different functional units according to needs, that is, the internal structure of the terminal device is divided into different functional units to perform all or part of the above described functions. Each functional unit in the embodiments may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the application. The specific working process of the units in the system may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed terminal device and method may be implemented in other ways. For example, the above-described terminal device embodiments are merely illustrative, and for example, the division of the units is only one logical function division, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The above-mentioned embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A task allocation method based on load allocation is characterized by comprising the following steps:

if the task type is a maximum computation task, determining each bound computing node as the target node, and distributing the computing task to all the target nodes, wherein the maximum computation task is a task with the processing speed as high as possible and without fixed timeliness requirements;

carrying out calculation force monitoring on the target node distributed with the calculation task to obtain a real-time calculation force value of the target node;

if the real-time force value is smaller than the average force value of the target node, and the absolute value of the difference between the real-time force value and the average force value exceeds a preset fluctuation value, determining the target node as a node to be evaluated, reducing the calculated amount distributed to the node to be evaluated according to the real-time force value, and distributing the redundant calculated amount to other idle target nodes.

2. The task allocation method according to claim 1, wherein the determining the target node as a node to be evaluated, reducing the amount of computation allocated to the node to be evaluated according to the real-time computation value, and allocating excess amount of computation to other idle target nodes comprises:

determining a processor where the node to be evaluated is located as a processor to be evaluated;

and uniformly reducing the calculated amount distributed to all the target nodes in the processor to be evaluated according to the real-time calculated value, and distributing redundant calculated amount to other idle target nodes which do not belong to the processor to be evaluated.

3. The task allocation method according to claim 1, wherein after reducing the computation amount allocated to the node to be evaluated according to the real-time computation force value and allocating the redundant computation amount to other idle target nodes, the method further comprises:

acquiring at least two real-time calculation force values of the node to be evaluated in a preset monitoring time period, and analyzing a calculation force change trend value of the node to be evaluated according to all the acquired real-time calculation force values;

if the calculation force variation trend value is smaller than zero and the newly acquired real-time calculation force value is smaller than a preset valley value, unbinding the node to be evaluated and reallocating the calculation amount currently allocated to the node to be evaluated to other idle target nodes;

and if the calculation force variation trend value is greater than or equal to zero and the newly acquired real-time calculation force value is greater than or equal to the average calculation force value, reallocating the calculation amount initially allocated to the node to be evaluated.

4. The task assignment method of claim 1, wherein said performing computation statistics on each of said computing nodes of the binding to obtain an average computation value for each of said computing nodes comprises:

the computing nodes which are bound and located on the same processor are classified into a node group, wherein the node group comprises at least one computing node;

fragmenting a preset statistical task according to the number of the computing nodes in the node group, and distributing the fragmented statistical task to each computing node in the node group for processing;

acquiring the processing time of each computing node, and calculating the average computation force value according to the processing time and the calculated amount of the fragmented statistical task, wherein the processing time is the time of the statistical task processed and fragmented by the computing nodes.

5. The task assigning method according to claim 1, wherein the analyzing the calculation node satisfying the task force value according to the average force value, determining the analyzed calculation node as a target node, and assigning the calculation task to the target node, comprises:

sequencing all the bound computing nodes according to the numerical sequence of the average computing force value to generate a task allocation sequence;

analyzing the minimum computing node meeting the task computing force value according to the task distribution sequence, determining the analyzed computing node as the target node, and distributing the computing task to the target node.

6. A task allocation apparatus based on load scheduling, comprising:

the second distribution unit is used for determining each bound computing node as the target node and distributing the computing task to all the target nodes if the task type is a maximum computing power task, wherein the maximum computing power task refers to a task with the processing speed as high as possible and without a fixed time efficiency requirement;

and the evaluation unit is used for determining the target node as a node to be evaluated if the real-time computing force value is smaller than the average computing force value of the target node and the absolute value of the difference between the real-time computing force value and the average computing force value exceeds a preset fluctuation value, reducing the computing amount distributed to the node to be evaluated according to the real-time computing force value, and distributing redundant computing amount to other idle target nodes.

7. A load-scheduling-based task allocation terminal device, wherein the terminal device comprises a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor executes the computer program to implement the following steps:

carrying out calculation force monitoring on the target node which is distributed with the calculation task to obtain a real-time calculation force value of the target node;

8. The terminal device of claim 7, further comprising:

9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the task assigning method according to any one of claims 1 to 5.