CN107992392A - A kind of automatic monitoring repair system and method for cloud rendering system - Google Patents
A kind of automatic monitoring repair system and method for cloud rendering system Download PDFInfo
- Publication number
- CN107992392A CN107992392A CN201711165385.7A CN201711165385A CN107992392A CN 107992392 A CN107992392 A CN 107992392A CN 201711165385 A CN201711165385 A CN 201711165385A CN 107992392 A CN107992392 A CN 107992392A
- Authority
- CN
- China
- Prior art keywords
- rendering
- server
- task
- module
- management
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3006—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0709—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0793—Remedial or corrective actions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3055—Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/32—Monitoring with visual or acoustical indication of the functioning of the machine
- G06F11/324—Display of status information
- G06F11/327—Alarm or error message display
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/508—Monitor
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Mathematical Physics (AREA)
- Debugging And Monitoring (AREA)
- Information Transfer Between Computers (AREA)
- Computer And Data Communications (AREA)
Abstract
The present invention provides a kind of automatic monitoring repair system for cloud rendering system, subscription client is used for producer and sets task parameters, and required rendering task is uploaded to main transfer server;Main transfer server is used to verify the account registration information for uploading rendering task, and rendering task is distributed to matched secondary transfer server;Secondary transfer server is rendered for rendering task to be distributed to matched rendering server according to operation state data, is additionally operable to operation state data sending to management server and main transfer server;Rendering server is used to perform rendering task;Management server is used to rendering server is carried out according to operation state data to detect automatically, repairs;Management client is used to be modified the exception information in management server.Each rendering server can be monitored automatically by the present invention, and administrator is carried out more automatic management to rendering farm server, improve the efficiency of management, optimization renders farm use.
Description
Technical field
The present invention relates to automatic monitoring and recovery technique field, more particularly to a kind of automatic monitoring for cloud rendering system
Repair system and method.
Background technology
Computer animation technology is one of technical field with fastest developing speed in the world.To obtain the computer animation of high quality,
Need after animation modeling, the work such as movements design is completed, the processing of to scene render etc..In order to obtain optimal wash with watercolours
Effect is contaminated, it is necessary to lot of materials, this will take a large amount of cpu resources, such as, it is generally the case that a resolution ratio is higher
Picture in render process, it will expend number 10 it is small when as long as.So general at present make large-scale cartoon, special efficacy film etc. all
Cloud rendering system (be also referred to as cloud rendering platform, render farm) can be used.
For the drawbacks of tradition renders, cloud rendering system is that a kind of state-of-the-art based on cloud computing service renders
Solution.By cloud rendering system, it is parallel that user can call thousands of Cloud Servers to carry out in short several seconds
Calculating renders.One rendering platform can be made of hundreds and thousands of a rendering servers, at present for so many server, accordingly
Management software have been able to that the resource in whole network is carried out reasonably to distribute and optimize, to submit to the operation of system into
Row management, implements cross-platform, multi engine, the extensive of multitask renders.But must also artificial not timing in terms of server is safeguarded
Check each server state, carry out manual maintenance, or after exception occurs in task, go to check server exception situation.Base
In this, two major issues that the management for cloud rendering system needs to solve are:
1. can possess automatic monitoring rendering server dynamic operation data, and abnormal conditions are repaired or fed back in time
The ability of administrator;
2. whether by monitoring each calculating rendering server state, Analysis server operating condition, in time to cloud wash with watercolours in real time
Dye system optimizes and (whether needs replacing new demand servicing device, if need to increase local hard drive etc.).
Solve above-mentioned two problems in the following manner at this stage:Rendering server operation feelings are calculated first, checking manually
Condition, is more often that can just search respective server after task run exception, is repaired manually, this aspect cost of labor
Height, on the other hand in time cannot repair rendering server;Come second, there is abnormal number by experience or server
Settlement server problem, without monitoring record as voucher, can not optimize cloud rendering system in time.
The content of the invention
For a kind of insufficient existing for existing treatment mechanism, automatic monitoring reparation for cloud rendering system of offer of the invention
System and method.
On the one hand, the embodiment of the present invention provides a kind of automatic monitoring repair system for cloud rendering system, including user
Client, management client, main transfer server, management server, secondary transfer server and rendering server, wherein,
The subscription client is used for producer and sets task parameters, and required rendering task is uploaded in the master
Turn server;
The main transfer server is used to verify the account registration information for uploading the rendering task, and after being verified
Mission number is automatically generated, the rendering task is distributed to matched secondary transfer server, and generate rendering task distribution
Daily record;
The secondary transfer server is used for the operation state data for receiving affiliated rendering server, dynamic according to the operation
The rendering task is distributed to matched rendering server and is rendered by state data, generates secondary rendering task distribution daily record,
It is additionally operable to the operation state data sending to the management server and the main transfer server;
The rendering server is used to perform the rendering task, and after the completion of the rendering task, by implementing result
Sent via the corresponding secondary transfer server, the main transfer server to the subscription client;
The management server is used to detect the rendering server automatically according to the operation state data, repair
It is multiple, and send prompting message to the management client;
The management client is used for by checking prompting message, and the exception information in the management server is repaiied
Just.
In the automatic monitoring repair system provided by the present invention for cloud rendering system, the rendering task distributes daily record
Including task source user client id, client registers account number, task number, the first distribution time, matched secondary transfer clothes
Business device numbering, the secondary rendering task distribution daily record include the task number, the second distribution time, matched rendering server
Numbering.
In the automatic monitoring repair system provided by the present invention for cloud rendering system, the main transfer server includes
Reception/passback module, identification module, monitoring module, processing module, memory module and distribution module, wherein,
The reception/passback module, for receiving the rendering task from the subscription client, and by the wash with watercolours
Dye task is transmitted to the identification module;
The identification module is used for according to default rule, identifies whether the rendering task belongs to and has verified that account number, if
It is not to feed back to the subscription client, if it is, create task form and preserve to the memory module, while by institute
State rendering task and be sent to the processing module;
The processing module is used to generate the rendering task distribution according to the operation state data of the rendering server
Daily record, and preserve to the memory module;
The distribution module is used to the rendering task is distributed to the matching according to rendering task distribution daily record
Secondary transfer server;
The memory module is used to store the task form and rendering task distribution daily record.
In the automatic monitoring repair system provided by the present invention for cloud rendering system, the management server includes number
According to acquisition module, data memory module and trigger module, wherein,
The data acquisition module is used for the operation state data for gathering the rendering server, forms operation list
The data memory module is stored in, is additionally operable to that exception information is issued the trigger module when there are abnormal conditions;
The trigger module includes abnormal data model library, and the trigger module is searched in the abnormal data model library
The exception information of the rendering server, triggers reparation corresponding with the exception information or feedback behavior, and will operation
It is recorded in rendering server log list, preserves to the data memory module.
In the automatic monitoring repair system provided by the present invention for cloud rendering system, if the exception information is wash with watercolours
It is abnormal to contaminate server software, task stops, then the trigger module detects other matched rendering servers and continues automatically
Render, and restart abnormal rendering server, by operation note in the rendering server log list;
If the exception information is in off-line state for rendering server, no task renders, then the trigger module weight
Open the rendering server, and by operation note in the rendering server log list, if restart it is invalid, send described in carry
Information of waking up is to the management client;
If the exception information is overflowed for rendering server memory, task stops, then the trigger module detects automatically
Other matched rendering servers continue to render, and restart abnormal rendering server, and operation note is rendered clothes in described
Business device log list, and the prompting message is sent to the management client;
If the exception information is rendering server network interruption, can not connect, then the trigger module is sent automatically
The prompting message is to the management client, by operation note in the rendering server log list;
If the exception information frequently occurs same abnormal conditions for rendering server, the trigger module is sent out automatically
The prompting message is sent to the management client, by operation note in the rendering server log list.
Correspondingly, the present invention also provides the automatic monitoring restorative procedure for cloud rendering system, comprise the following steps:
Step S1:Producer sets task parameters by subscription client, and required rendering task is uploaded in master
Turn server;
Step S2:The account registration information of the rendering task is uploaded by the main transfer server verification, and is being tested
Card automatically generates mission number after passing through, and the rendering task is distributed to matched secondary transfer server, and generate and render
Task distributes daily record;
Step S3:The rendering server sends operation state data to the corresponding secondary transfer server, described
Secondary transfer server is by the operation state data sending to management server and the main transfer server;
Step S4:The rendering task is distributed to matching by the secondary transfer server according to the operation state data
Rendering server rendered, generate secondary rendering task distribution daily record;
Step S5:The rendering server performs the rendering task, and after the completion of the rendering task, will perform knot
Fruit is sent to the subscription client via the corresponding secondary transfer server, the main transfer server;
Step S6:The management server examines the rendering server according to the operation state data automatically
Survey, repair, and prompting message is sent to management client;
Step S7:The management client checks prompting message, and the exception information in the management server is carried out
Correct.
In the automatic monitoring restorative procedure provided by the present invention for cloud rendering system, the rendering task distributes daily record
Including task source user client id, client registers account number, task number, the first distribution time, matched secondary transfer clothes
Business device numbering, the secondary rendering task distribution daily record include the task number, the second distribution time, matched rendering server
Numbering.
In the automatic monitoring restorative procedure provided by the present invention for cloud rendering system, the step S2 includes:
Step S21:The rendering task from the subscription client is received via reception/passback module, and by institute
State rendering task and be transmitted to identification module;
Step S22:Via the identification module according to default rule, identify whether the rendering task belongs to and have verified that
Account number, if not then feed back to the subscription client, if it is, creating task form and preserving to memory module, at the same time
The rendering task is sent to processing module;
Step S23:Rendered via the processing module according to the generation of the operation state data of the rendering server
Task distributes daily record, and preserves to the memory module;
Step S24:Distributed via distribution module according to the rendering task described in the rendering task is distributed to by daily record
Matched secondary transfer server.
In the automatic monitoring restorative procedure provided by the present invention for cloud rendering system, the step S6 includes:
Step S61:Via the operation state data of rendering server described in data collecting module collected, operation is formed
List is stored in data memory module, and exception information is issued trigger module when there are abnormal conditions;
Step S62:The trigger module searches the described different of the rendering server in the abnormal data model library
Normal information, triggers reparation corresponding with the exception information or feedback behavior, and operation note is arranged in rendering server daily record
In table, preserve to the data memory module.
In the automatic monitoring restorative procedure provided by the present invention for cloud rendering system, if the exception information is wash with watercolours
It is abnormal to contaminate server software, task stops, then the trigger module detects other matched rendering servers and continues automatically
Render, and restart abnormal rendering server, by operation note in the rendering server log list;
If the exception information is in off-line state for rendering server, no task renders, then the trigger module weight
Open the rendering server, and by operation note in the rendering server log list, if restart it is invalid, send described in carry
Information of waking up is to the management client;
If the exception information is overflowed for rendering server memory, task stops, then the trigger module detects automatically
Other matched rendering servers continue to render, and restart abnormal rendering server, and operation note is rendered clothes in described
Business device log list, and the prompting message is sent to the management client;
If the exception information is rendering server network interruption, can not connect, then the trigger module is sent automatically
The prompting message is to the management client, by operation note in the rendering server log list;
If the exception information frequently occurs same abnormal conditions for rendering server, the trigger module is sent out automatically
The prompting message is sent to the management client, by operation note in the rendering server log list.
Implement the embodiment of the present invention, have the advantages that:Provided by the present invention for the automatic prison of cloud rendering system
Repair system and method are controlled, cloud rendering system is relied on, rendering server is monitored in real time, service is rendered to thousands of
Device carries out high efficiency smart management, and monitoring data is analyzed, by more intuitive mode display data, to rendering server
Carry out rational allocation, replace the operations such as abnormal server, increase server local hard disk, be more convenient, more targetedly optimize
Farm is rendered, improves rendering efficiency.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
There is attached drawing needed in technology description to be briefly described, it should be apparent that, drawings in the following description are only this
Some embodiments of invention, for those of ordinary skill in the art, without creative efforts, can be with
Other attached drawings are obtained according to these attached drawings.
Fig. 1 show the principle of the automatic monitoring repair system for being used for cloud rendering system of one embodiment of the invention offer
Figure;
Fig. 2 show the schematic diagram of the main transfer server shown in Fig. 1;
Fig. 3 show the schematic diagram of the management server shown in Fig. 1;
Fig. 4 show the rendering server memory of in August, 2017 under secondary transfer server A and overflows number statistical chart;
Fig. 5 show the flow of the automatic monitoring restorative procedure for being used for cloud rendering system of one embodiment of the invention offer
Figure;
Fig. 6 show the flow chart of the step S2 shown in Fig. 5;
Fig. 7 show the flow chart of the step S6 shown in Fig. 5.
Embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art are obtained every other without creative efforts
Embodiment, belongs to the scope of protection of the invention.
Fig. 1 show the principle of the automatic monitoring repair system for being used for cloud rendering system of one embodiment of the invention offer
Figure, as shown in Figure 1, including subscription client 10, pipe provided by the present invention for the automatic monitoring repair system of cloud rendering system
Client 20, main transfer server 30, management server 40, secondary transfer server 50 and rendering server 60 are managed, wherein,
The subscription client is used for producer and sets task parameters, and required rendering task is uploaded in the master
Turn server;
The main transfer server is used to verify the account registration information for uploading the rendering task, and after being verified
Mission number is automatically generated, the rendering task is distributed to matched secondary transfer server, and generate rendering task distribution
Daily record;
The secondary transfer server is used for the operation state data for receiving affiliated rendering server, dynamic according to the operation
The rendering task is distributed to matched rendering server and is rendered by state data, generates secondary rendering task distribution daily record,
It is additionally operable to the operation state data sending to the management server and the main transfer server;
The rendering server is used to perform the rendering task, and after the completion of the rendering task, by implementing result
Sent via the corresponding secondary transfer server, the main transfer server to the subscription client;
The management server is used to detect the rendering server automatically according to the operation state data, repair
It is multiple, and send prompting message to the management client;
The management client is used for by checking prompting message, and the exception information in the management server is repaiied
Just.
In the present invention, subscription client is installed on the computer of producer, after by client, task parameters are set,
Required rendering task is uploaded.Main transfer server uploads account registration information by checking, automatically generates and appoints after being verified
Business numbering, monitors the load of rendering server according to secondary transfer server and task distributes multidate information, task is distributed to
Matched secondary transfer server, and rendering task distribution daily record is generated, wherein, the rendering task distribution daily record includes task
Source user client id, client registers account number, task number, the first distribution time, matched secondary transfer server numbering;
The secondary affiliated rendering server operation state data of transfer server real-time reception, matching is distributed to according to monitoring data by task
Rendering server rendered, generate secondary rendering task distribution daily record, wherein, the secondary rendering task distribution daily record bag
Include the task number, the second distribution time, matched rendering server numbering;After the completion of task renders, task is automatically by secondary
Level transfer server, main transfer server, the computer of transmission task is downloaded to automatically according to set naming rule.
In the present invention, more main transfer servers can be included.It is the schematic diagram of main transfer server shown in Fig. 2, such as
Shown in Fig. 2, the main transfer server includes reception/passback module 310, identification module 320, monitoring module 330, processing module
340th, memory module 350 and distribution module 360, wherein,
The reception/passback module, for receiving the rendering task from the subscription client, and by the wash with watercolours
Dye task is transmitted to the identification module;
The identification module, for according to default rule, identifying whether the rendering task belongs to and having verified that account number, if
It is not to feed back to the subscription client, if it is, create task form and preserve to the memory module, while by institute
State rendering task and be sent to the processing module;
The processing module is used to generate the rendering task distribution according to the operation state data of the rendering server
Daily record, and preserve to the memory module;
The distribution module is used to the rendering task is distributed to the matching according to rendering task distribution daily record
Secondary transfer server;
The memory module is used to store the task form and rendering task distribution daily record.
In the present invention, main transfer server can be connected with more producer's computers by network, with management server,
All secondary transfer servers are connected at a high speed using LAN.Reception/passback module provides an end being connected with external computer
Mouthful, it is responsible for receiving the rendering task file uploaded from producer's computer, and assignment file is transmitted to identification module;Identify mould
According to default rule, in identification, whether transmitting file belongs to root tuber has verified that account number, if not then feeding back to source computer;If
It is then to create task form (time, source computer or carrys out source network IP, task priority, task size, task frame number etc.
Information), preserve to memory module, continue task being sent to processing module;Processing module is according to each rendering server dynamic number
According to generation task distribution daily record, preserve to memory module;Distribution module distributes daily record according to task and is distributed to task corresponding time
Level transfer server;Secondary transfer server carries out task distribution according to task priority and required service device quantity;Store mould
Block is used for store tasks list and distribution daily record.
Fig. 3 show the schematic diagram of management server, as shown in figure 3, the management server includes data acquisition module
410th, data memory module 420 and trigger module 430, wherein,
The data acquisition module is used for the operation state data for gathering the rendering server, forms operation list
The data memory module is stored in, is additionally operable to that exception information is issued the trigger module when there are abnormal conditions;
The trigger module includes abnormal data model library, and the trigger module is searched in the abnormal data model library
The exception information of the rendering server, triggers reparation corresponding with the exception information or feedback behavior, and will operation
It is recorded in rendering server log list, preserves to the data memory module.
To improve the efficient management to abnormal rendering server, it is necessary to which management server examines rendering server automatically
Survey, repair, feed back to administrator in time.Management server can directly be connected with more administrator's computer networks, be taken with main transfer
Device, all secondary transfer server LAN high speeds of being engaged in are connected.The dynamic data of Application Monitoring is same on each rendering server
When send time transfer server and management server to, ensure that management server is identical with the monitoring data of main transfer server.
Each rendering server dynamic monitoring data of data collecting module collected, including rendering server address, the time, calculating task number,
Duration, CPU usage, memory usage, network state, operating status etc. are calculated, list is formed and is stored in memory module, if going out
Exception information is issued trigger module by existing abnormal conditions.Trigger module includes abnormal data model library, and administrator can be to the mould
Type storehouse such as is increased, is changed, being deleted at the operation.According to model library preset rules, trigger module believes the exception of rendering server
Breath, into line search, triggers corresponding reparation or feedback behavior in model library, and by operation note in rendering server log list
In, preserve to memory module.If there is new exception information, abnormal data model library is updated in time.
Model library exception information includes but is not limited in trigger module:
If the exception information is rendering server software anomaly, task stops, then the trigger module detects automatically
Other matched rendering servers continue to render, and restart abnormal rendering server, and operation note is rendered in described
Server log list;
If the exception information is in off-line state for rendering server, no task renders, then the trigger module weight
Open the rendering server, and by operation note in the rendering server log list, if restart it is invalid, send described in carry
Information of waking up is to the management client;
If the exception information is overflowed for rendering server memory, task stops, then the trigger module detects automatically
Other matched rendering servers continue to render, and restart abnormal rendering server, and operation note is rendered clothes in described
Business device log list, and the prompting message is sent to the management client;
If the exception information is rendering server network interruption, can not connect, then the trigger module is sent automatically
The prompting message is to the management client, by operation note in the rendering server log list;
If the exception information frequently occurs same abnormal conditions for rendering server, the trigger module is sent out automatically
The prompting message is sent to the management client, by operation note in the rendering server log list.
In the present invention, client has cell phone application or short message prompting function.Administrator by checking by that can remind letter
Breath, is added or changes operation, sophisticated model storehouse content to the Exception Model storehouse in management server.Administrator can pass through
Arrange parameter (than if desired for checking in certain period rendering node off-line case under secondary transfer server), calls the parameter
Abnormal data, and intuitively graphical representation is carried out, rational allocation is carried out to rendering server according to display data, replaces exception
The operations such as server, increase server local hard disk, are more convenient, more targetedly optimization renders farm, raising rendering efficiency.
Fig. 4 show the rendering server memory of in August, 2017 under secondary transfer server A and overflows number statistical chart.Management
Member is by calling statistics in daily record, output parameter:It is the affiliated rendering server of secondary transfer server A, in August, 2017, interior
Deposit spilling number and obtain chart as shown in Figure 4, administrator can check A003 rendering servers memory and all according to data
Business situation, and Extended RAM or modification task allocation information in time.Administrator can also be according to display data to rendering service
Device carries out rational allocation, replaces the operations such as abnormal server, increase server local hard disk, is more convenient, is more targetedly excellent
Change renders farm, improves rendering efficiency.
Fig. 5 show the flow of the automatic monitoring restorative procedure for being used for cloud rendering system of one embodiment of the invention offer
Figure, as shown in figure 5, provided by the present invention for the automatic monitoring restorative procedure of cloud rendering system, comprises the following steps:
Step S1:Producer sets task parameters by subscription client, and required rendering task is uploaded in master
Turn server;
Step S2:The account registration information of the rendering task is uploaded by the main transfer server verification, and is being tested
Card automatically generates mission number after passing through, and the rendering task is distributed to matched secondary transfer server, and generate and render
Task distributes daily record;
Specifically, the step S2 includes:
Step S21:The rendering task from the subscription client is received via reception/passback module, and by institute
State rendering task and be transmitted to identification module;
Step S22:Via the identification module according to default rule, identify whether the rendering task belongs to and have verified that
Account number, if not then feed back to the subscription client, if it is, creating task form and preserving to memory module, at the same time
The rendering task is sent to processing module;
Step S23:Rendered via the processing module according to the generation of the operation state data of the rendering server
Task distributes daily record, and preserves to the memory module;
Step S24:Distributed via distribution module according to the rendering task described in the rendering task is distributed to by daily record
Matched secondary transfer server.
Step S3:The rendering server sends operation state data to the corresponding secondary transfer server, described
Secondary transfer server is by the operation state data sending to management server and the main transfer server;
Step S4:The rendering task is distributed to matching by the secondary transfer server according to the operation state data
Rendering server rendered, generate secondary rendering task distribution daily record;
Step S5:The rendering server performs the rendering task, and after the completion of the rendering task, will perform knot
Fruit is sent to the subscription client via the corresponding secondary transfer server, the main transfer server;
Step S6:The management server examines the rendering server according to the operation state data automatically
Survey, repair, and prompting message is sent to management client;
Specifically, the step S6 includes:
Step S61:Via the operation state data of rendering server described in data collecting module collected, operation is formed
List is stored in data memory module, and exception information is issued trigger module when there are abnormal conditions;
Step S62:The trigger module searches the described different of the rendering server in the abnormal data model library
Normal information, triggers reparation corresponding with the exception information or feedback behavior, and operation note is arranged in rendering server daily record
In table, preserve to the data memory module.
Step S7:The management client checks prompting message, and the exception information in the management server is carried out
Correct.
Provided by the present invention for the automatic monitoring repair system and method for cloud rendering system, cloud rendering system is relied on, it is right
Rendering server is monitored in real time, carries out high efficiency smart management to thousands of rendering server, and to monitoring data into
Row analysis, by more intuitive mode display data, carries out rendering server rational allocation, replaces abnormal server, increase
Server local hard disk etc. operates, and is more convenient, more targetedly optimization renders farm, raising rendering efficiency.
The above disclosed power for being only a kind of preferred embodiment of the present invention, the present invention cannot being limited with this certainly
Sharp scope, one of ordinary skill in the art will appreciate that realizing all or part of flow of above-described embodiment, and is weighed according to the present invention
Profit requires made equivalent variations, still falls within and invents covered scope.
Claims (10)
1. a kind of automatic monitoring repair system for cloud rendering system, it is characterised in that including subscription client, managing customer
End, main transfer server, management server, secondary transfer server and rendering server, wherein,
The subscription client is used for producer and sets task parameters, and required rendering task is uploaded to the main transfer and is taken
Business device;
The main transfer server is used to verify the account registration information for uploading the rendering task, and automatic after being verified
Mission number is generated, the rendering task is distributed to matched secondary transfer server, and generate rendering task distribution daily record;
The secondary transfer server is used for the operation state data for receiving affiliated rendering server, according to the operation state number
Rendered according to the rendering task is distributed to matched rendering server, generate secondary rendering task distribution daily record, also use
In by the operation state data sending to the management server and the main transfer server;
The rendering server is used to perform the rendering task, and after the completion of the rendering task, by implementing result via
The corresponding secondary transfer server, the main transfer server are sent to the subscription client;
The management server is used to the rendering server is carried out according to the operation state data to detect automatically, repairs,
And send prompting message to the management client;
The management client is used for by checking prompting message, and the exception information in the management server is modified.
2. the automatic monitoring repair system according to claim 1 for cloud rendering system, it is characterised in that described to render
Task distribution daily record includes task source user client id, client registers account number, task number, the first distribution time, matching
Secondary transfer server numbering, it is described secondary rendering task distribution daily record include the task number, the second distribution time, matching
Rendering server numbering.
3. the automatic monitoring repair system according to claim 1 for cloud rendering system, it is characterised in that in the master
Turning server includes reception/passback module, identification module, monitoring module, processing module, memory module and distribution module, wherein,
The reception/passback module, appoints for receiving the rendering task from the subscription client, and by described render
Business is transmitted to the identification module;
The identification module is used for according to default rule, identifies whether the rendering task belongs to and has verified that account number, if not
The subscription client is then fed back to, if it is, create task form and preserve to the memory module, while by the wash with watercolours
Dye task is sent to the processing module;
The processing module is used to generate the rendering task distribution daily record according to the operation state data of the rendering server,
And preserve to the memory module;
The distribution module is used to be distributed to the rendering task described matched time according to rendering task distribution daily record
Level transfer server;
The memory module is used to store the task form and rendering task distribution daily record.
4. the automatic monitoring repair system according to claim 1 for cloud rendering system, it is characterised in that the management
Server includes data acquisition module, data memory module and trigger module, wherein,
The data acquisition module is used for the operation state data for gathering the rendering server, forms operation list and preserves
In the data memory module, it is additionally operable to that exception information is issued the trigger module when there are abnormal conditions;
The trigger module includes abnormal data model library, described in the trigger module is searched in the abnormal data model library
The exception information of rendering server, triggers reparation corresponding with the exception information or feeds back behavior, and by operation note
In rendering server log list, preserve to the data memory module.
5. the automatic monitoring repair system according to claim 4 for cloud rendering system, it is characterised in that if described
Exception information is rendering server software anomaly, and task stops, then the trigger module detects other and matched renders clothes automatically
Business device continues to render, and restarts abnormal rendering server, by operation note in the rendering server log list;
If the exception information is in off-line state for rendering server, no task renders, then the trigger module restarts institute
Rendering server is stated, and by operation note in the rendering server log list, if restarting invalid, send and described remind letter
Cease to the management client;
If the exception information is overflowed for rendering server memory, task stops, then the trigger module detects other automatically
Matched rendering server continues to render, and restarts abnormal rendering server, by operation note in the rendering server
Log list, and the prompting message is sent to the management client;
If the exception information is rendering server network interruption, it can not connect, then described in the automatic transmission of the trigger module
Prompting message is to the management client, by operation note in the rendering server log list;
If the exception information frequently occurs same abnormal conditions for rendering server, the trigger module sends institute automatically
Prompting message is stated to the management client, by operation note in the rendering server log list.
6. a kind of automatic monitoring restorative procedure for cloud rendering system, it is characterised in that comprise the following steps:
Step S1:Producer sets task parameters by subscription client, and required rendering task is uploaded to main transfer and is taken
Business device;
Step S2:The account registration information of the rendering task is uploaded by the main transfer server verification, and it is logical in verification
Later mission number is automatically generated, the rendering task is distributed to matched secondary transfer server, and generate rendering task
Distribute daily record;
Step S3:The rendering server sends operation state data, the secondary to the corresponding secondary transfer server
Transfer server is by the operation state data sending to management server and the main transfer server;
Step S4:The rendering task is distributed to matched wash with watercolours by the secondary transfer server according to the operation state data
Dye server is rendered, and generates secondary rendering task distribution daily record;
Step S5:The rendering server performs the rendering task, and after the completion of the rendering task, implementing result is passed through
Sent by the corresponding secondary transfer server, the main transfer server to the subscription client;
Step S6:The management server detects the rendering server according to the operation state data, is repaiied automatically
It is multiple, and send prompting message to management client;
Step S7:The management client checks prompting message, and the exception information in the management server is modified.
7. the automatic monitoring restorative procedure according to claim 6 for cloud rendering system, it is characterised in that described to render
Task distribution daily record includes task source user client id, client registers account number, task number, the first distribution time, matching
Secondary transfer server numbering, it is described secondary rendering task distribution daily record include the task number, the second distribution time, matching
Rendering server numbering.
8. the automatic monitoring restorative procedure according to claim 6 for cloud rendering system, it is characterised in that the step
S2 includes:
Step S21:The rendering task from the subscription client is received via reception/passback module, and by the wash with watercolours
Dye task is transmitted to identification module;
Step S22:Via the identification module according to default rule, identify whether the rendering task belongs to and have verified that account
Number, if not then feeding back to the subscription client, if it is, creating task form and preserving to memory module, at the same time will
The rendering task is sent to processing module;
Step S23:Via the processing module rendering task is generated according to the operation state data of the rendering server
Daily record is distributed, and is preserved to the memory module;
Step S24:Daily record is distributed according to the rendering task rendering task is distributed to the matching via distribution module
Secondary transfer server.
9. the automatic monitoring restorative procedure according to claim 6 for cloud rendering system, it is characterised in that the step
S6 includes:
Step S61:Via the operation state data of rendering server described in data collecting module collected, operation list is formed
Data memory module is stored in, and exception information is issued into trigger module when there are abnormal conditions;
Step S62:The trigger module searches the abnormal letter of the rendering server in the abnormal data model library
Breath, triggers reparation corresponding with the exception information or feedback behavior, and by operation note in rendering server log list,
Preserve to the data memory module.
10. the automatic monitoring restorative procedure according to claim 9 for cloud rendering system, it is characterised in that if institute
It is rendering server software anomaly to state exception information, and task stops, then the trigger module detects other and matched renders automatically
Server continues to render, and restarts abnormal rendering server, by operation note in the rendering server log list;
If the exception information is in off-line state for rendering server, no task renders, then the trigger module restarts institute
Rendering server is stated, and by operation note in the rendering server log list, if restarting invalid, send and described remind letter
Cease to the management client;
If the exception information is overflowed for rendering server memory, task stops, then the trigger module detects other automatically
Matched rendering server continues to render, and restarts abnormal rendering server, by operation note in the rendering server
Log list, and the prompting message is sent to the management client;
If the exception information is rendering server network interruption, it can not connect, then described in the automatic transmission of the trigger module
Prompting message is to the management client, by operation note in the rendering server log list;
If the exception information frequently occurs same abnormal conditions for rendering server, the trigger module sends institute automatically
Prompting message is stated to the management client, by operation note in the rendering server log list.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201711165385.7A CN107992392B (en) | 2017-11-21 | 2017-11-21 | Automatic monitoring and repairing system and method for cloud rendering system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201711165385.7A CN107992392B (en) | 2017-11-21 | 2017-11-21 | Automatic monitoring and repairing system and method for cloud rendering system |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN107992392A true CN107992392A (en) | 2018-05-04 |
| CN107992392B CN107992392B (en) | 2021-03-23 |
Family
ID=62031870
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201711165385.7A Active CN107992392B (en) | 2017-11-21 | 2017-11-21 | Automatic monitoring and repairing system and method for cloud rendering system |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN107992392B (en) |
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111028124A (en) * | 2019-11-29 | 2020-04-17 | 安徽赛诚云渲网络科技有限公司 | Rendering system |
| CN111488542A (en) * | 2019-01-29 | 2020-08-04 | 上海哔哩哔哩科技有限公司 | Webpage output method, device, system and storage medium |
| CN111563027A (en) * | 2020-04-30 | 2020-08-21 | 北京视博云信息技术有限公司 | Application operation monitoring method, device and system |
| CN112118463A (en) * | 2019-06-21 | 2020-12-22 | 广州虎牙科技有限公司 | Information processing method, cloud platform and information processing system |
| CN114490097A (en) * | 2022-01-12 | 2022-05-13 | 北京易智时代数字科技有限公司 | Management system for rendering service and VR display system |
| WO2022222403A1 (en) * | 2021-04-21 | 2022-10-27 | 上海商汤科技开发有限公司 | Task distribution system, method, and apparatus, computer device, and storage medium |
| CN115865518A (en) * | 2023-01-30 | 2023-03-28 | 天云融创数据科技(北京)有限公司 | Cloud platform data processing method and system based on big data |
| CN116828215A (en) * | 2023-08-30 | 2023-09-29 | 湖南马栏山视频先进技术研究院有限公司 | Video rendering method and system for reducing local computing power load |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103268220A (en) * | 2012-02-24 | 2013-08-28 | 苏州蓝海彤翔系统科技有限公司 | Software architecture suitable for large-scale animation rendering service cloud platform |
| CN103442036A (en) * | 2013-08-09 | 2013-12-11 | 苏州蓝海彤翔系统科技有限公司 | System integrating design development, post production and data storage and based on cloud platform |
| CN103874989A (en) * | 2011-11-07 | 2014-06-18 | 史克威尔·艾尼克斯控股公司 | Rendering server, central server, encoding device, control method, encoding method, program, and recording medium |
| CN105071969A (en) * | 2015-08-19 | 2015-11-18 | 焦点科技股份有限公司 | JMX (Java Management Extensions)-based customization real-time monitoring and automatic exception handling system and method |
| CN105446810A (en) * | 2015-12-24 | 2016-03-30 | 赞奇科技发展有限公司 | Cost based multi-farm cloud rendering task distributing system and method |
| CN106127844A (en) * | 2016-06-22 | 2016-11-16 | 民政部零研究所 | Mobile phone users real-time, interactive access long-range 3D scene render exchange method |
| TWI579709B (en) * | 2015-11-05 | 2017-04-21 | Chunghwa Telecom Co Ltd | Instantly analyze the scene file and automatically fill the cloud of the cloud system and methods |
-
2017
- 2017-11-21 CN CN201711165385.7A patent/CN107992392B/en active Active
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103874989A (en) * | 2011-11-07 | 2014-06-18 | 史克威尔·艾尼克斯控股公司 | Rendering server, central server, encoding device, control method, encoding method, program, and recording medium |
| CN103268220A (en) * | 2012-02-24 | 2013-08-28 | 苏州蓝海彤翔系统科技有限公司 | Software architecture suitable for large-scale animation rendering service cloud platform |
| CN103442036A (en) * | 2013-08-09 | 2013-12-11 | 苏州蓝海彤翔系统科技有限公司 | System integrating design development, post production and data storage and based on cloud platform |
| CN105071969A (en) * | 2015-08-19 | 2015-11-18 | 焦点科技股份有限公司 | JMX (Java Management Extensions)-based customization real-time monitoring and automatic exception handling system and method |
| TWI579709B (en) * | 2015-11-05 | 2017-04-21 | Chunghwa Telecom Co Ltd | Instantly analyze the scene file and automatically fill the cloud of the cloud system and methods |
| CN105446810A (en) * | 2015-12-24 | 2016-03-30 | 赞奇科技发展有限公司 | Cost based multi-farm cloud rendering task distributing system and method |
| CN106127844A (en) * | 2016-06-22 | 2016-11-16 | 民政部零研究所 | Mobile phone users real-time, interactive access long-range 3D scene render exchange method |
Non-Patent Citations (3)
| Title |
|---|
| 廖宏建等: "基于云计算的动漫渲染实验平台研究与实现", 《实验室研究与探索》 * |
| 董陆阳: "基于层次化调度策略的渲染作业管理系统的研究与实现", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 * |
| 蔡靖: "动漫平台集群渲染系统的研究与实现", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 * |
Cited By (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111488542A (en) * | 2019-01-29 | 2020-08-04 | 上海哔哩哔哩科技有限公司 | Webpage output method, device, system and storage medium |
| CN111488542B (en) * | 2019-01-29 | 2023-09-26 | 上海哔哩哔哩科技有限公司 | Webpage output method, device, system and storage medium |
| CN112118463A (en) * | 2019-06-21 | 2020-12-22 | 广州虎牙科技有限公司 | Information processing method, cloud platform and information processing system |
| CN111028124A (en) * | 2019-11-29 | 2020-04-17 | 安徽赛诚云渲网络科技有限公司 | Rendering system |
| CN111563027A (en) * | 2020-04-30 | 2020-08-21 | 北京视博云信息技术有限公司 | Application operation monitoring method, device and system |
| CN111563027B (en) * | 2020-04-30 | 2023-09-01 | 北京视博云信息技术有限公司 | Application operation monitoring method, device and system |
| WO2022222403A1 (en) * | 2021-04-21 | 2022-10-27 | 上海商汤科技开发有限公司 | Task distribution system, method, and apparatus, computer device, and storage medium |
| CN114490097A (en) * | 2022-01-12 | 2022-05-13 | 北京易智时代数字科技有限公司 | Management system for rendering service and VR display system |
| CN115865518A (en) * | 2023-01-30 | 2023-03-28 | 天云融创数据科技(北京)有限公司 | Cloud platform data processing method and system based on big data |
| CN115865518B (en) * | 2023-01-30 | 2023-05-16 | 天云融创数据科技(北京)有限公司 | Cloud platform data processing method and system based on big data |
| CN116828215A (en) * | 2023-08-30 | 2023-09-29 | 湖南马栏山视频先进技术研究院有限公司 | Video rendering method and system for reducing local computing power load |
| CN116828215B (en) * | 2023-08-30 | 2023-11-14 | 湖南马栏山视频先进技术研究院有限公司 | Video rendering method and system for reducing local computing power load |
Also Published As
| Publication number | Publication date |
|---|---|
| CN107992392B (en) | 2021-03-23 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN107992392A (en) | A kind of automatic monitoring repair system and method for cloud rendering system | |
| CN106844198B (en) | Distributed dispatching automation test platform and method | |
| CN107291565B (en) | Operation and maintenance visual automatic operation platform and implementation method | |
| CN102571396B (en) | Communication network system and routing inspection subsystem and routing inspection method of communication equipment | |
| CN102508709B (en) | Distributed-cache-based acquisition task scheduling method in purchase, supply and selling integrated electric energy acquiring and monitoring system | |
| CN108322345A (en) | A kind of dissemination method and server of fault restoration data packet | |
| CN107508722B (en) | Service monitoring method and device | |
| US20220052923A1 (en) | Data processing method and device, storage medium and electronic device | |
| CN112600891A (en) | Edge cloud cooperation system based on information physical fusion and working method | |
| CN105872068A (en) | Cloud platform and automatic operation check method based on same | |
| CN108845798A (en) | A kind of visualization big data task cradle and processing method | |
| CN111158708A (en) | Task arrangement engine system | |
| US20070226231A1 (en) | Systems and methods for managing business issues | |
| CN105447681A (en) | Physicochemical detection control and information management system | |
| CN110968479B (en) | Service level full-link monitoring method and server for application program | |
| CN114996006A (en) | Server arrangement configuration execution method, device, equipment and medium | |
| CN101860564A (en) | Protocol-based service composition system and method | |
| CN110011827A (en) | Towards doctor conjuncted multi-user's big data analysis service system and method | |
| US12436812B1 (en) | Systems and methods to facilitate adaptive resource capacity prediction and control using cloud infrastructures with a capacity prediction interface | |
| US12277447B1 (en) | Systems and methods to facilitate adaptive resource capacity prediction and control using cloud infrastructures | |
| CN111324460A (en) | Power monitoring control system and method based on cloud computing platform | |
| CN109639490A (en) | A kind of delay machine notification method and device | |
| CN113824801A (en) | A unified access management component system for intelligent fusion terminals | |
| CN113312174A (en) | Information query method and device, electronic equipment and container management system | |
| CN112990744A (en) | Automatic operation and maintenance method and device for massive million-level cloud equipment |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |