CN115391053A

CN115391053A - Online service method and device based on CPU and GPU hybrid calculation

Info

Publication number: CN115391053A
Application number: CN202211315013.9A
Authority: CN
Inventors: 曲昱汇; 张献涛; 支涛
Original assignee: Beijing Yunji Technology Co Ltd
Current assignee: Beijing Yunji Technology Co Ltd
Priority date: 2022-10-26
Filing date: 2022-10-26
Publication date: 2022-11-25
Anticipated expiration: 2042-10-26
Also published as: CN115391053B

Abstract

The application provides an online service method and device based on mixed calculation of a CPU and a GPU. The method comprises the following steps: adding the received online service request into a cache queue; according to the timing task or the trigger event, selecting a computing component to perform asynchronous processing on the online service request in the cache queue, wherein the computing component comprises a CPU and a GPU; in the process of calling the computing components to asynchronously process the online service requests, determining the weight corresponding to each computing component according to the number of the online service requests in the cache queue and the average computing time consumption of the computing components; and dynamically calling a CPU or a GPU to execute the online service request according to the weight corresponding to each computing component, and storing a service result corresponding to the online service request into a service result set, wherein the service result set comprises identification information of the online service request and service result information corresponding to the identification information. The method and the device improve the computing efficiency of the online computing service and improve the utilization rate of hardware resources.

Description

Online service method and device based on CPU and GPU hybrid calculation

Technical Field

The application relates to the technical field of computers, in particular to an online service method and device based on mixed calculation of a CPU and a GPU.

Background

The current GPU (graphics Processing Unit) has a good parallel Processing capability, and can perform accelerated computation in many fields and application scenarios. In particular, several computing tasks can be processed at once, in parallel or in batches. The traditional CPU is mainly used for single task calculation, and a matched online service interface and a calling mode are simple. Many online services in the current environment need to be transformed by combining the functions of the GPU, so as to achieve an acceleration effect.

In the existing on-line computing service related to a CPU and a GPU, one method is to perform identification calculation by installing an algorithm SDK (Software Development Kit) in each robot, and due to hardware limitation, the identification efficiency is slow, the effect of returning results in real time cannot be achieved, and certain resource waste is caused by multiple times of installation. The other method is to perform cloud end request and upload pictures to cloud end service to obtain a recognition result, but the current cloud end online service generally cannot perform object picture recognition of batch request, only one picture request can be sent each time, the throughput is low, and computing resources are wasted. Therefore, the existing online computing service method has the problems of low computing efficiency and serious waste of hardware resources.

Disclosure of Invention

In view of this, embodiments of the present application provide an online service method and apparatus based on mixed computation of a CPU and a GPU, so as to solve the problems of low computation efficiency and serious waste of hardware resources in the prior art.

In a first aspect of the embodiments of the present application, an online service method based on mixed computation of a CPU and a GPU is provided, including: receiving an online service request, and adding the online service request into a cache queue; according to a preset timing task or a trigger event, selecting a computing assembly to perform asynchronous processing on an online service request in a cache queue, wherein the computing assembly comprises a CPU and a GPU; in the process of calling the computing components to asynchronously process the online service requests, determining the weight corresponding to each computing component according to the number of the online service requests in the cache queue and the average computing time consumption of the computing components; and dynamically calling a CPU or a GPU to execute the online service request according to the weight corresponding to each computing component, and storing a service result corresponding to the online service request into a service result set, wherein the service result set comprises identification information of the online service request and service result information corresponding to the identification information.

In a second aspect of the embodiments of the present application, an online service device based on mixed computation of a CPU and a GPU is provided, including: the receiving module is configured to receive the online service request and add the online service request to a cache queue; the processing module is configured to select a computing component to perform asynchronous processing on the online service request in the cache queue according to a preset timing task or a trigger event, wherein the computing component comprises a CPU (central processing unit) and a GPU (graphic processing unit); the computing module is configured to determine the weight corresponding to each computing component according to the number of the online service requests in the cache queue and the average computing time consumption of the computing components in the asynchronous processing process of calling the computing components to the online service requests; and the calling module is configured to dynamically call the CPU or the GPU to execute the online service request according to the weight corresponding to each computing component, and store a service result corresponding to the online service request into a service result set, wherein the service result set comprises identification information of the online service request and service result information corresponding to the identification information.

In a third aspect of the embodiments of the present application, there is provided an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the method.

In a fourth aspect of the embodiments of the present application, a computer-readable storage medium is provided, in which a computer program is stored, and the computer program realizes the steps of the above method when being executed by a processor.

The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects:

adding the online service request to a cache queue by receiving the online service request; according to a preset timing task or a trigger event, selecting a computing component to perform asynchronous processing on an online service request in a cache queue, wherein the computing component comprises a CPU and a GPU; in the process of calling the computing components to asynchronously process the online service requests, determining the weight corresponding to each computing component according to the number of the online service requests in the cache queue and the average computing time consumption of the computing components; and dynamically calling a CPU or a GPU to execute the online service request according to the weight corresponding to each computing component, and storing a service result corresponding to the online service request into a service result set, wherein the service result set comprises identification information of the online service request and service result information corresponding to the identification information. The method and the device improve the computing efficiency of the online computing service and improve the utilization rate of hardware resources.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a schematic diagram illustrating a flow of processing a request involved in an actual scenario according to an embodiment of the present application;

FIG. 2 is a schematic flowchart of an online service method based on a CPU and GPU hybrid computing according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of an online service device based on a CPU and GPU hybrid computing according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

As described in the background art, the current GPU (graphics Processing Unit) has a very good parallel Processing capability, and can perform accelerated computation in many fields and application scenarios. In particular, a plurality of computing tasks can be processed in parallel or in batch at one time, and the method has obvious advantages. The traditional CPU is mainly used for single task calculation, and a matched online service interface and a calling mode are simple. Many online services in the current environment need to be transformed by combining the functions of the GPU, so as to achieve an acceleration effect. The following describes in detail the problems of the conventional computing service method by taking an example of a computing service method for identifying an image object by a robot.

The existing computing service method for image item identification mainly comprises the following two modes: the first method is to perform image recognition calculation by installing an algorithm SDK (Software Development Kit) in each robot, but due to hardware limitation, recognition efficiency is slow, an effect of returning a result in real time cannot be achieved, and certain resource waste is caused by multiple installation. In another mode, a cloud request is made, and the pictures are uploaded to a cloud service to obtain a recognition result, but the current cloud online service generally cannot perform batch-requested item picture recognition, only one picture request can be sent each time, the throughput is low, and computing resources are wasted.

In view of the problems in the prior art, an embodiment of the present application provides an online service method in a hybrid computing environment combining a CPU and a GPU, which can improve computing efficiency and hardware resource utilization, and fig. 1 is a schematic diagram of a request processing flow involved in an actual scenario of an embodiment of the present application, as shown in fig. 1, where a request message is sent to a queue cache, an intermediate layer performs asynchronous processing on an online service request in the cache queue based on a timed task or a trigger event, after the online service request is processed, a query result is cached, a locked single thread is called by using a DLL algorithm packet, a CPU/GPU batch processing calculation result is called in different situations according to a scheduling algorithm, and finally the processing result is stored in a result storage set, processing of a timed query database is waited, and a query result is returned.

Fig. 2 is a schematic flowchart of an online service method based on mixed computation of a CPU and a GPU according to an embodiment of the present application. The online service method of fig. 2 based on the CPU and GPU hybrid computation may be performed by an online server. As shown in fig. 2, the online service method based on the mixed computation of the CPU and the GPU may specifically include:

s201, receiving an online service request, and adding the online service request into a cache queue;

s202, according to a preset timing task or a trigger event, selecting a computing component to perform asynchronous processing on an online service request in a cache queue, wherein the computing component comprises a CPU and a GPU;

s203, in the process of calling the computing components to asynchronously process the online service requests, determining the weight corresponding to each computing component according to the number of the online service requests in the cache queue and the average computing time consumption of the computing components;

s204, dynamically calling a CPU or a GPU to execute the online service request according to the weight corresponding to each computing component, and storing a service result corresponding to the online service request into a service result set, wherein the service result set comprises identification information of the online service request and service result information corresponding to the identification information.

Specifically, in the embodiment of the present application, the image article identification of the robot is taken as an example of an online service scene for performing an expansion description, so that the online service request may be considered that after the robot takes an article image, the article image is uploaded to the system platform, and the system platform generates the online service request for image article identification according to the article image, and the generated online service request is sent to the online server for identification processing.

Further, the computing components in the embodiments of the present application include a CPU component and a GPU component, where a GPU (graphical Processing Unit) has a good parallel Processing capability, and can simultaneously process a plurality of computing tasks in batch at one time. In addition, the robot of the embodiment of the present application includes but is not limited to: intelligent mobile robots, delivery robots, hotel service robots, greeting robots, and the like.

In some embodiments, receiving an online service request, and adding the online service request to a buffer queue, includes: receiving a character string corresponding to original picture data sent by a cloud platform, generating an online service request according to the character string, and adding the online service request into a cache queue according to a time sequence; the character string is obtained by the robot generating original picture data according to articles received in a robot cabin and sending the original picture data to the cloud platform so that the cloud platform encodes the original picture data.

Specifically, taking a delivery robot as an example, the robot captures an image of an article received in a cabin and sends the image to a cloud platform, a cloud platform calling interface (for example, a restful type interface) performs coding processing (for example, base64 coding) on an original image to obtain a character string, an online service request is generated according to the character string corresponding to the original image, the online service requests are sequentially sent to an online server, and after the online server receives the online service requests sent by the cloud platform, the online service requests are sequentially added to a cache queue according to the sequence of receiving time, and a timing task call of image identification is waited.

Here, base64 encoding of an original image may be considered as encoding a piece of picture data into a string of character strings, and using the string of character strings instead of an image address URL, for example: the method frequently used is to implement encoding processing of original image data based on a Base64 module, then transmit the encoded image data, and decode and restore the original image data after receiving a data stream at a server side.

In some embodiments, selecting a compute component to asynchronously process the online service request in the cache queue according to the timing task includes: and according to a preset timing task execution time interval, sending the online service requests added to the cache queue in the timing task execution time interval to a GPU for batch execution, or sequentially sending the online service requests to the CPU for cyclic execution.

Specifically, for the online service request stored in the cache queue, there are the following two calling modes: the first is asynchronous execution based on a timing task, and the second is asynchronous execution based on a trigger event; the following describes in detail the implementation processes of the two calling methods with reference to specific embodiments.

Further, in the asynchronous execution process of the online service requests based on the timed task, first, the execution time of the timed task is determined according to the request amount in the actual unit time, the execution time of the timed task may be considered as a time interval, and every time the timed task reaches the preset time interval of the execution of the timed task, all the online service requests in the cache queue are sent to the GPU for batch execution, or all the online service requests in the cache queue are sequentially and circularly executed by adopting the CPU.

In some embodiments, selecting a computing component to asynchronously process the online service request in the cache queue according to the trigger event includes: when the number of the requests corresponding to the online service requests in the cache queue reaches a number threshold value, the online service requests in the cache queue are sent to a GPU for batch execution, or the online service requests are sent to a CPU in sequence for cyclic execution; or when any one of the computing components in the CPU and the GPU is in the space state, the online service request in the cache queue is sent to the computing component in the space state for execution.

Specifically, compared with an asynchronous execution mechanism based on a timed task, the asynchronous execution mechanism based on a trigger event performs asynchronous execution according to the number of requests of the online service request, and when it is determined that the number of the online service requests in the cache queue reaches the number threshold, any one of the following execution manners may be adopted to process the online service request, including:

(1) Sending the online service requests in the cache queue to a GPU for batch execution;

(2) Sending the online service requests in the cache queue to a CPU for sequential cycle execution;

(3) And when any one of the CPU and the GPU is in a space state, sending the online service request in the cache queue to the calculation component in the space state for execution.

Further, due to the difference between the CPU and GPU performance of a machine, the call rate of the CPU and GPU with different models and different performance is different when actually calling, which depends on the number of requests in the buffer queue and the difference between the two calculation speeds, and a specific calculation mode is selected when the number of requests in the buffer queue reaches a limit value. When the CPU or the GPU is occupied for calculation, the idle calculation unit is preferentially selected for calculation. When both are idle, the selection calculation can be performed according to the weight calculation method provided by the embodiment of the present application.

In some embodiments, when both the CPU and the GPU are in the spatial state, the weight corresponding to each of the calculation components is calculated by using the following formula:

wherein,

the weight corresponding to the CPU is represented,

the corresponding weight of the GPU is represented,

indicating the online service requests in the buffer queue,

representing the task capability of the current GPU for parallel computation,

indicating that the CPU has consumed the computation of performing the online service request once on average over the past time period,

represents GThe PU is time consuming to perform the computation of an online service request on average over the past time period.

In practical applications, the request in the current buffer queue may be recorded as a request, and the current buffer queue includes k requests, which are recorded as requests respectively

. The past time period may be selected to be one hour in the past to calculate the average elapsed time for the CPU and GPU.

Specifically, according to the number of online service requests in a cache queue, the task capacity of parallel calculation of the current GPU, the calculation time consumed by averagely executing one online service request by a CPU in the past time period and the calculation time consumed by averagely executing one online service request by the GPU in the past time period, the weight of each calculation component is calculated respectively, because the factors influencing the weight calculation are changed in real time, the weight corresponding to the calculation component is also a dynamic weight, and a proper calculation component is selected according to the weight calculation result to execute the online service request.

In some embodiments, dynamically invoking the CPU or the GPU to execute the online service request according to the weight corresponding to each computing component includes: when the request quantity of the online service requests is lower than a preset threshold value, selecting a calculation component with low average calculation time consumption to be called to execute the online service requests; when the request quantity of the online service request is higher than the task capacity of the current GPU in parallel computing, selectively calling the GPU to execute the online service request; when the average calculation time consumption of the CPU is lower than that of the GPU, the request number of the online service requests in the cache queue is lower than a preset threshold value, and the CPU is in an idle state currently, the CPU is selected to be called to execute the online service requests; and when the predicted time consumption of the on-line service request in the cache queue in the circulating execution by using the CPU is larger than the predicted time consumption of the GPU in the batch execution, selectively calling the GPU to execute the on-line service request.

Specifically, in an optional example, when the number of online service requests in the cache queue is low, for example, the threshold value of the number is set to be 10, when the number of online service requests is lower than 10, the computing component with low average computation time consumption is preferentially selected to execute the online service requests in the cache queue, and when the number of online service requests is higher than 10, the GPU component is preferentially selected to execute the online service requests in the cache queue in parallel, so as to achieve the purpose of high throughput. That is, for a machine with relatively high GPU performance, GPU computations are prioritized for each task execution, when the GPU is occupied, a single request is executed using the CPU, and the occupancy state of the GPU is checked at the next request.

Further, for a machine with higher CPU performance and poorer GPU performance, when there are fewer requests in the request queue (i.e. the buffer queue) and the CPU is currently in an idle state, the CPU cycle computation is preferentially considered, and when the time after all the requests in the request queue are executed using the CPU cycle is greater than the GPU execution time, the GPU is used for executing.

In some embodiments, storing the service result corresponding to the online service request in a service result set includes: after the execution of the online service request is completed, caching the identification information corresponding to the online service request and the service result information into a service result set, and after the online service request with the same identification information is received, sending the service result information corresponding to the identification information cached in advance in the service result set to an online server.

Specifically, storing an identification result corresponding to each online service request into a service result set, and waiting for query result task query to return, wherein it needs to be stated that each online service request has unique identification information so as to correspond to the service result information of each online service request; the identification information and the identification result (service result information) of each online service request are cached, so that when the online server receives the online service requests of the same object image, the corresponding service result information can be quickly acquired by inquiring the service result set and sent to the online server.

Further, the embodiment of the present application further provides an asynchronous call result storage service, so as to store the online service request information and the identification result (service result information) into a database, where the database is stored in a sub-table, and the storage information includes, but is not limited to, the following information: the online service request information, the service result information, the server information, the CPU execution information and the GPU execution information are stored, and the follow-up statistics of the execution efficiency of each server, the CPU and the GPU is facilitated.

According to the technical scheme provided by the embodiment of the application, the sustainable service flow based on the CPU and GPU mixed calling algorithm is provided, in order to provide sustainable service for algorithms such as a picture recognition algorithm and the like which need the CPU and GPU mixed calculation, the intermediate layer performs concurrent control, requests for queue storage, the CPU and the GPU are scheduled and the like, and the calculation efficiency and the hardware resource utilization rate are greatly improved. The application development relates to a background service for receiving all requests, using a queue to cache and distribute the requests, using a timing task to carry out CPU/GPU mixed calling and query results, receiving identification results, then using an asynchronous request storage service to store request information and the identification results, and returning the results.

The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.

Fig. 3 is a schematic structural diagram of an online service device based on CPU and GPU hybrid computing according to an embodiment of the present application. As shown in fig. 3, the online service device based on the mixed computation of the CPU and the GPU includes:

a receiving module 301 configured to receive an online service request, and add the online service request to a buffer queue;

the processing module 302 is configured to select a computing component to perform asynchronous processing on the online service request in the cache queue according to a preset timing task or a trigger event, where the computing component includes a CPU and a GPU;

the calculating module 303 is configured to determine a weight corresponding to each calculating component according to the number of the online service requests in the cache queue and the average calculation time consumption of the calculating components in the process of calling the calculating components to perform asynchronous processing on the online service requests;

the invoking module 304 is configured to dynamically invoke the CPU or the GPU to execute the online service request according to the weight corresponding to each computing component, and store a service result corresponding to the online service request into a service result set, where the service result set includes identification information of the online service request and service result information corresponding to the identification information.

In some embodiments, the receiving module 301 in fig. 3 receives a character string corresponding to original picture data sent by a cloud platform, generates an online service request according to the character string, and adds the online service request to a cache queue according to a time sequence; the character string is obtained by the robot generating original picture data according to articles received in a robot cabin and sending the original picture data to the cloud platform so that the cloud platform encodes the original picture data.

In some embodiments, the processing module 302 in fig. 3 sends the online service requests added to the buffer queue within the timing task execution time interval to the GPU for batch execution according to a preset timing task execution time interval, or sends the online service requests to the CPU in sequence for circular execution.

In some embodiments, when the number of requests corresponding to the online service requests in the cache queue reaches a number threshold, the processing module 302 in fig. 3 sends the online service requests in the cache queue to the GPU for batch execution, or sends the online service requests in sequence to the CPU for circular execution; or when any one of the computing components in the CPU and the GPU is in a space state, the online service request in the cache queue is sent to the computing component in the space state for execution.

In some embodiments, the invoking module 304 of fig. 3 selects a computing component with low average computation time to be invoked to execute the online service request when the number of requests of the online service request is lower than a preset threshold; when the request quantity of the online service request is higher than the task capacity of the current GPU in parallel computing, selectively calling the GPU to execute the online service request; when the average calculation time consumption of the CPU is lower than that of the GPU, the request number of the online service requests in the cache queue is lower than a preset threshold value, and the CPU is in an idle state currently, the CPU is selected to be called to execute the online service requests; and when the predicted time consumption of the on-line service requests in the cache queue in the circulating execution by using the CPU is larger than the predicted time consumption of the on-line service requests in the batch execution by using the GPU, selectively calling the GPU to execute the on-line service requests.

In some embodiments, after the execution of the online service request is completed, the invoking module 304 in fig. 3 caches identification information and service result information corresponding to the online service request into a service result set, and after an online service request with the same identification information is received, sends service result information corresponding to identification information cached in advance in the service result set to the online server.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Fig. 4 is a schematic structural diagram of an electronic device 4 provided in an embodiment of the present application. As shown in fig. 4, the electronic apparatus 4 of this embodiment includes: a processor 401, a memory 402, and a computer program 403 stored in the memory 402 and operable on the processor 401. The steps in the various method embodiments described above are implemented when the processor 401 executes the computer program 403. Alternatively, the processor 401 implements the functions of the respective modules/units in the above-described respective apparatus embodiments when executing the computer program 403.

Illustratively, the computer program 403 may be partitioned into one or more modules/units, which are stored in the memory 402 and executed by the processor 401 to accomplish the present application. One or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 403 in the electronic device 4.

The electronic device 4 may be a desktop computer, a notebook, a palm computer, a cloud server, or other electronic devices. The electronic device 4 may include, but is not limited to, a processor 401 and a memory 402. Those skilled in the art will appreciate that fig. 4 is merely an example of the electronic device 4, and does not constitute a limitation of the electronic device 4, and may include more or less components than those shown, or combine certain components, or different components, e.g., the electronic device may also include input-output devices, network access devices, buses, etc.

The Processor 401 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 402 may be an internal storage unit of the electronic device 4, for example, a hard disk or a memory of the electronic device 4. The memory 402 may also be an external storage device of the electronic device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the electronic device 4. Further, the memory 402 may also include both internal storage units of the electronic device 4 and external storage devices. The memory 402 is used for storing computer programs and other programs and data required by the electronic device. The memory 402 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules, so as to perform all or part of the functions described above. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the description of each embodiment has its own emphasis, and reference may be made to the related description of other embodiments for parts that are not described or recited in any embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/computer device and method may be implemented in other ways. For example, the above-described apparatus/computer device embodiments are merely illustrative, and for example, a module or a unit may be divided into only one logical function, another division may be made in actual implementation, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the foregoing embodiments may be implemented by a computer program, which may be stored in a computer readable storage medium and instructs related hardware to implement the steps of the foregoing method embodiments when executed by a processor. The computer program may comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic diskette, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signal, telecommunications signal, software distribution medium, etc. It should be noted that the computer readable medium may contain suitable additions or additions that may be required in accordance with legislative and patent practices within the jurisdiction, for example, in some jurisdictions, computer readable media may not include electrical carrier signals or telecommunications signals in accordance with legislative and patent practices.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. An online service method based on mixed calculation of a CPU and a GPU is characterized by comprising the following steps:

receiving an online service request, and adding the online service request into a cache queue;

according to a preset timing task or a trigger event, selecting a computing component to perform asynchronous processing on the online service request in the cache queue, wherein the computing component comprises a CPU and a GPU;

in the process of calling the computing components to perform asynchronous processing on the online service requests, determining the weight corresponding to each computing component according to the number of the online service requests in the cache queue and the average computing time consumption of the computing components;

and dynamically calling a CPU or a GPU to execute an online service request according to the weight corresponding to each computing component, and storing a service result corresponding to the online service request into a service result set, wherein the service result set comprises identification information of the online service request and service result information corresponding to the identification information.

2. The method of claim 1, wherein receiving an online service request and adding the online service request to a buffer queue comprises:

receiving a character string corresponding to original picture data sent by a cloud platform, generating the online service request according to the character string, and adding the online service request into the cache queue according to a time sequence;

the character string is obtained by generating original picture data according to an article received in a robot cabin by the robot and sending the original picture data to the cloud platform so that the cloud platform encodes the original picture data.

3. The method of claim 1, wherein selecting a compute component to asynchronously process the online service requests in the cache queue according to the timing task comprises:

and according to a preset timing task execution time interval, sending the online service requests added to the cache queue in the timing task execution time interval to a GPU for batch execution, or sending the online service requests to a CPU in sequence for cyclic execution.

4. The method of claim 1, wherein selecting a computing component to asynchronously process the online service requests in the cache queue based on a triggering event comprises:

when the number of the requests corresponding to the online service requests in the cache queue reaches a number threshold value, the online service requests in the cache queue are sent to a GPU for batch execution, or the online service requests are sent to a CPU in sequence for cyclic execution;

or when any one computing component of the CPU and the GPU is in a space state, the online service request in the cache queue is sent to the computing component in the space state for execution.

5. The method according to claim 3 or 4, wherein when the CPU and the GPU are both in a space state, the weight corresponding to each computing component is calculated by adopting the following formula:

wherein,

the corresponding weight of the CPU is represented,

the corresponding weight of the GPU is represented,

indicating the online service requests in the buffer queue,

representing the task capability of the current GPU for parallel computation,

indicating that the CPU has been consuming computing time to perform an online service request on average over a past period of time,

which represents the calculation time of the GPU to perform one online service request on average over the past time period.

6. The method of claim 5, wherein dynamically invoking a CPU or a GPU to execute an online service request according to the weight corresponding to each computing component comprises:

when the request quantity of the online service requests is lower than a preset threshold value, selecting a calculation component with low average calculation time consumption to be called to execute the online service requests;

when the request quantity of the online service request is higher than the task capacity of the current GPU in parallel computing, selectively calling the GPU to execute the online service request;

when the average calculation time consumption of the CPU is lower than that of the GPU, the request number of the online service requests in the cache queue is lower than a preset threshold value, and the CPU is in an idle state currently, the CPU is selected to be called to execute the online service requests;

and when the predicted time consumption of the online service request in the cache queue in the circulating execution by using the CPU is larger than the predicted time consumption of the online service request in the batch execution by using the GPU, selecting and calling the GPU to execute the online service request.

7. The method of claim 1, wherein storing the service result corresponding to the online service request in a service result set comprises:

after the execution of the online service request is completed, caching identification information and service result information corresponding to the online service request into the service result set, and after the online service request with the same identification information is received, sending the service result information corresponding to the identification information cached in advance in the service result set to an online server.

8. An online service device based on mixed calculation of a CPU and a GPU is characterized by comprising:

the system comprises a receiving module, a caching module and a sending module, wherein the receiving module is configured to receive an online service request and add the online service request to a caching queue;

the processing module is configured to select a computing component to perform asynchronous processing on the online service request in the cache queue according to a preset timing task or a trigger event, wherein the computing component comprises a CPU (central processing unit) and a GPU (graphics processing unit);

the computing module is configured to determine a weight corresponding to each computing component according to the number of the online service requests in the cache queue and the average computing time consumption of the computing components in the asynchronous processing process of the online service requests by calling the computing components;

and the calling module is configured to dynamically call a CPU or a GPU to execute an online service request according to the weight corresponding to each computing component, and store a service result corresponding to the online service request into a service result set, wherein the service result set comprises identification information of the online service request and service result information corresponding to the identification information.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1 to 7 when executing the program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.