+

CN107992392A - A kind of automatic monitoring repair system and method for cloud rendering system - Google Patents

A kind of automatic monitoring repair system and method for cloud rendering system Download PDF

Info

Publication number
CN107992392A
CN107992392A CN201711165385.7A CN201711165385A CN107992392A CN 107992392 A CN107992392 A CN 107992392A CN 201711165385 A CN201711165385 A CN 201711165385A CN 107992392 A CN107992392 A CN 107992392A
Authority
CN
China
Prior art keywords
rendering
server
task
module
management
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711165385.7A
Other languages
Chinese (zh)
Other versions
CN107992392B (en
Inventor
都政
秦莉兰
井革新
陈远磊
陈聪梅
刘昭
靳绍巍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NATIONAL SUPERCOMPUTING CENTER IN SHENZHEN (SHENZHEN CLOUD COMPUTING CENTER)
Original Assignee
NATIONAL SUPERCOMPUTING CENTER IN SHENZHEN (SHENZHEN CLOUD COMPUTING CENTER)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NATIONAL SUPERCOMPUTING CENTER IN SHENZHEN (SHENZHEN CLOUD COMPUTING CENTER) filed Critical NATIONAL SUPERCOMPUTING CENTER IN SHENZHEN (SHENZHEN CLOUD COMPUTING CENTER)
Priority to CN201711165385.7A priority Critical patent/CN107992392B/en
Publication of CN107992392A publication Critical patent/CN107992392A/en
Application granted granted Critical
Publication of CN107992392B publication Critical patent/CN107992392B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/508Monitor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)
  • Information Transfer Between Computers (AREA)
  • Computer And Data Communications (AREA)

Abstract

The present invention provides a kind of automatic monitoring repair system for cloud rendering system, subscription client is used for producer and sets task parameters, and required rendering task is uploaded to main transfer server;Main transfer server is used to verify the account registration information for uploading rendering task, and rendering task is distributed to matched secondary transfer server;Secondary transfer server is rendered for rendering task to be distributed to matched rendering server according to operation state data, is additionally operable to operation state data sending to management server and main transfer server;Rendering server is used to perform rendering task;Management server is used to rendering server is carried out according to operation state data to detect automatically, repairs;Management client is used to be modified the exception information in management server.Each rendering server can be monitored automatically by the present invention, and administrator is carried out more automatic management to rendering farm server, improve the efficiency of management, optimization renders farm use.

Description

A kind of automatic monitoring repair system and method for cloud rendering system
Technical field
The present invention relates to automatic monitoring and recovery technique field, more particularly to a kind of automatic monitoring for cloud rendering system Repair system and method.
Background technology
Computer animation technology is one of technical field with fastest developing speed in the world.To obtain the computer animation of high quality, Need after animation modeling, the work such as movements design is completed, the processing of to scene render etc..In order to obtain optimal wash with watercolours Effect is contaminated, it is necessary to lot of materials, this will take a large amount of cpu resources, such as, it is generally the case that a resolution ratio is higher Picture in render process, it will expend number 10 it is small when as long as.So general at present make large-scale cartoon, special efficacy film etc. all Cloud rendering system (be also referred to as cloud rendering platform, render farm) can be used.
For the drawbacks of tradition renders, cloud rendering system is that a kind of state-of-the-art based on cloud computing service renders Solution.By cloud rendering system, it is parallel that user can call thousands of Cloud Servers to carry out in short several seconds Calculating renders.One rendering platform can be made of hundreds and thousands of a rendering servers, at present for so many server, accordingly Management software have been able to that the resource in whole network is carried out reasonably to distribute and optimize, to submit to the operation of system into Row management, implements cross-platform, multi engine, the extensive of multitask renders.But must also artificial not timing in terms of server is safeguarded Check each server state, carry out manual maintenance, or after exception occurs in task, go to check server exception situation.Base In this, two major issues that the management for cloud rendering system needs to solve are:
1. can possess automatic monitoring rendering server dynamic operation data, and abnormal conditions are repaired or fed back in time The ability of administrator;
2. whether by monitoring each calculating rendering server state, Analysis server operating condition, in time to cloud wash with watercolours in real time Dye system optimizes and (whether needs replacing new demand servicing device, if need to increase local hard drive etc.).
Solve above-mentioned two problems in the following manner at this stage:Rendering server operation feelings are calculated first, checking manually Condition, is more often that can just search respective server after task run exception, is repaired manually, this aspect cost of labor Height, on the other hand in time cannot repair rendering server;Come second, there is abnormal number by experience or server Settlement server problem, without monitoring record as voucher, can not optimize cloud rendering system in time.
The content of the invention
For a kind of insufficient existing for existing treatment mechanism, automatic monitoring reparation for cloud rendering system of offer of the invention System and method.
On the one hand, the embodiment of the present invention provides a kind of automatic monitoring repair system for cloud rendering system, including user Client, management client, main transfer server, management server, secondary transfer server and rendering server, wherein,
The subscription client is used for producer and sets task parameters, and required rendering task is uploaded in the master Turn server;
The main transfer server is used to verify the account registration information for uploading the rendering task, and after being verified Mission number is automatically generated, the rendering task is distributed to matched secondary transfer server, and generate rendering task distribution Daily record;
The secondary transfer server is used for the operation state data for receiving affiliated rendering server, dynamic according to the operation The rendering task is distributed to matched rendering server and is rendered by state data, generates secondary rendering task distribution daily record, It is additionally operable to the operation state data sending to the management server and the main transfer server;
The rendering server is used to perform the rendering task, and after the completion of the rendering task, by implementing result Sent via the corresponding secondary transfer server, the main transfer server to the subscription client;
The management server is used to detect the rendering server automatically according to the operation state data, repair It is multiple, and send prompting message to the management client;
The management client is used for by checking prompting message, and the exception information in the management server is repaiied Just.
In the automatic monitoring repair system provided by the present invention for cloud rendering system, the rendering task distributes daily record Including task source user client id, client registers account number, task number, the first distribution time, matched secondary transfer clothes Business device numbering, the secondary rendering task distribution daily record include the task number, the second distribution time, matched rendering server Numbering.
In the automatic monitoring repair system provided by the present invention for cloud rendering system, the main transfer server includes Reception/passback module, identification module, monitoring module, processing module, memory module and distribution module, wherein,
The reception/passback module, for receiving the rendering task from the subscription client, and by the wash with watercolours Dye task is transmitted to the identification module;
The identification module is used for according to default rule, identifies whether the rendering task belongs to and has verified that account number, if It is not to feed back to the subscription client, if it is, create task form and preserve to the memory module, while by institute State rendering task and be sent to the processing module;
The processing module is used to generate the rendering task distribution according to the operation state data of the rendering server Daily record, and preserve to the memory module;
The distribution module is used to the rendering task is distributed to the matching according to rendering task distribution daily record Secondary transfer server;
The memory module is used to store the task form and rendering task distribution daily record.
In the automatic monitoring repair system provided by the present invention for cloud rendering system, the management server includes number According to acquisition module, data memory module and trigger module, wherein,
The data acquisition module is used for the operation state data for gathering the rendering server, forms operation list The data memory module is stored in, is additionally operable to that exception information is issued the trigger module when there are abnormal conditions;
The trigger module includes abnormal data model library, and the trigger module is searched in the abnormal data model library The exception information of the rendering server, triggers reparation corresponding with the exception information or feedback behavior, and will operation It is recorded in rendering server log list, preserves to the data memory module.
In the automatic monitoring repair system provided by the present invention for cloud rendering system, if the exception information is wash with watercolours It is abnormal to contaminate server software, task stops, then the trigger module detects other matched rendering servers and continues automatically Render, and restart abnormal rendering server, by operation note in the rendering server log list;
If the exception information is in off-line state for rendering server, no task renders, then the trigger module weight Open the rendering server, and by operation note in the rendering server log list, if restart it is invalid, send described in carry Information of waking up is to the management client;
If the exception information is overflowed for rendering server memory, task stops, then the trigger module detects automatically Other matched rendering servers continue to render, and restart abnormal rendering server, and operation note is rendered clothes in described Business device log list, and the prompting message is sent to the management client;
If the exception information is rendering server network interruption, can not connect, then the trigger module is sent automatically The prompting message is to the management client, by operation note in the rendering server log list;
If the exception information frequently occurs same abnormal conditions for rendering server, the trigger module is sent out automatically The prompting message is sent to the management client, by operation note in the rendering server log list.
Correspondingly, the present invention also provides the automatic monitoring restorative procedure for cloud rendering system, comprise the following steps:
Step S1:Producer sets task parameters by subscription client, and required rendering task is uploaded in master Turn server;
Step S2:The account registration information of the rendering task is uploaded by the main transfer server verification, and is being tested Card automatically generates mission number after passing through, and the rendering task is distributed to matched secondary transfer server, and generate and render Task distributes daily record;
Step S3:The rendering server sends operation state data to the corresponding secondary transfer server, described Secondary transfer server is by the operation state data sending to management server and the main transfer server;
Step S4:The rendering task is distributed to matching by the secondary transfer server according to the operation state data Rendering server rendered, generate secondary rendering task distribution daily record;
Step S5:The rendering server performs the rendering task, and after the completion of the rendering task, will perform knot Fruit is sent to the subscription client via the corresponding secondary transfer server, the main transfer server;
Step S6:The management server examines the rendering server according to the operation state data automatically Survey, repair, and prompting message is sent to management client;
Step S7:The management client checks prompting message, and the exception information in the management server is carried out Correct.
In the automatic monitoring restorative procedure provided by the present invention for cloud rendering system, the rendering task distributes daily record Including task source user client id, client registers account number, task number, the first distribution time, matched secondary transfer clothes Business device numbering, the secondary rendering task distribution daily record include the task number, the second distribution time, matched rendering server Numbering.
In the automatic monitoring restorative procedure provided by the present invention for cloud rendering system, the step S2 includes:
Step S21:The rendering task from the subscription client is received via reception/passback module, and by institute State rendering task and be transmitted to identification module;
Step S22:Via the identification module according to default rule, identify whether the rendering task belongs to and have verified that Account number, if not then feed back to the subscription client, if it is, creating task form and preserving to memory module, at the same time The rendering task is sent to processing module;
Step S23:Rendered via the processing module according to the generation of the operation state data of the rendering server Task distributes daily record, and preserves to the memory module;
Step S24:Distributed via distribution module according to the rendering task described in the rendering task is distributed to by daily record Matched secondary transfer server.
In the automatic monitoring restorative procedure provided by the present invention for cloud rendering system, the step S6 includes:
Step S61:Via the operation state data of rendering server described in data collecting module collected, operation is formed List is stored in data memory module, and exception information is issued trigger module when there are abnormal conditions;
Step S62:The trigger module searches the described different of the rendering server in the abnormal data model library Normal information, triggers reparation corresponding with the exception information or feedback behavior, and operation note is arranged in rendering server daily record In table, preserve to the data memory module.
In the automatic monitoring restorative procedure provided by the present invention for cloud rendering system, if the exception information is wash with watercolours It is abnormal to contaminate server software, task stops, then the trigger module detects other matched rendering servers and continues automatically Render, and restart abnormal rendering server, by operation note in the rendering server log list;
If the exception information is in off-line state for rendering server, no task renders, then the trigger module weight Open the rendering server, and by operation note in the rendering server log list, if restart it is invalid, send described in carry Information of waking up is to the management client;
If the exception information is overflowed for rendering server memory, task stops, then the trigger module detects automatically Other matched rendering servers continue to render, and restart abnormal rendering server, and operation note is rendered clothes in described Business device log list, and the prompting message is sent to the management client;
If the exception information is rendering server network interruption, can not connect, then the trigger module is sent automatically The prompting message is to the management client, by operation note in the rendering server log list;
If the exception information frequently occurs same abnormal conditions for rendering server, the trigger module is sent out automatically The prompting message is sent to the management client, by operation note in the rendering server log list.
Implement the embodiment of the present invention, have the advantages that:Provided by the present invention for the automatic prison of cloud rendering system Repair system and method are controlled, cloud rendering system is relied on, rendering server is monitored in real time, service is rendered to thousands of Device carries out high efficiency smart management, and monitoring data is analyzed, by more intuitive mode display data, to rendering server Carry out rational allocation, replace the operations such as abnormal server, increase server local hard disk, be more convenient, more targetedly optimize Farm is rendered, improves rendering efficiency.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is attached drawing needed in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, without creative efforts, can be with Other attached drawings are obtained according to these attached drawings.
Fig. 1 show the principle of the automatic monitoring repair system for being used for cloud rendering system of one embodiment of the invention offer Figure;
Fig. 2 show the schematic diagram of the main transfer server shown in Fig. 1;
Fig. 3 show the schematic diagram of the management server shown in Fig. 1;
Fig. 4 show the rendering server memory of in August, 2017 under secondary transfer server A and overflows number statistical chart;
Fig. 5 show the flow of the automatic monitoring restorative procedure for being used for cloud rendering system of one embodiment of the invention offer Figure;
Fig. 6 show the flow chart of the step S2 shown in Fig. 5;
Fig. 7 show the flow chart of the step S6 shown in Fig. 5.
Embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other without creative efforts Embodiment, belongs to the scope of protection of the invention.
Fig. 1 show the principle of the automatic monitoring repair system for being used for cloud rendering system of one embodiment of the invention offer Figure, as shown in Figure 1, including subscription client 10, pipe provided by the present invention for the automatic monitoring repair system of cloud rendering system Client 20, main transfer server 30, management server 40, secondary transfer server 50 and rendering server 60 are managed, wherein,
The subscription client is used for producer and sets task parameters, and required rendering task is uploaded in the master Turn server;
The main transfer server is used to verify the account registration information for uploading the rendering task, and after being verified Mission number is automatically generated, the rendering task is distributed to matched secondary transfer server, and generate rendering task distribution Daily record;
The secondary transfer server is used for the operation state data for receiving affiliated rendering server, dynamic according to the operation The rendering task is distributed to matched rendering server and is rendered by state data, generates secondary rendering task distribution daily record, It is additionally operable to the operation state data sending to the management server and the main transfer server;
The rendering server is used to perform the rendering task, and after the completion of the rendering task, by implementing result Sent via the corresponding secondary transfer server, the main transfer server to the subscription client;
The management server is used to detect the rendering server automatically according to the operation state data, repair It is multiple, and send prompting message to the management client;
The management client is used for by checking prompting message, and the exception information in the management server is repaiied Just.
In the present invention, subscription client is installed on the computer of producer, after by client, task parameters are set, Required rendering task is uploaded.Main transfer server uploads account registration information by checking, automatically generates and appoints after being verified Business numbering, monitors the load of rendering server according to secondary transfer server and task distributes multidate information, task is distributed to Matched secondary transfer server, and rendering task distribution daily record is generated, wherein, the rendering task distribution daily record includes task Source user client id, client registers account number, task number, the first distribution time, matched secondary transfer server numbering; The secondary affiliated rendering server operation state data of transfer server real-time reception, matching is distributed to according to monitoring data by task Rendering server rendered, generate secondary rendering task distribution daily record, wherein, the secondary rendering task distribution daily record bag Include the task number, the second distribution time, matched rendering server numbering;After the completion of task renders, task is automatically by secondary Level transfer server, main transfer server, the computer of transmission task is downloaded to automatically according to set naming rule.
In the present invention, more main transfer servers can be included.It is the schematic diagram of main transfer server shown in Fig. 2, such as Shown in Fig. 2, the main transfer server includes reception/passback module 310, identification module 320, monitoring module 330, processing module 340th, memory module 350 and distribution module 360, wherein,
The reception/passback module, for receiving the rendering task from the subscription client, and by the wash with watercolours Dye task is transmitted to the identification module;
The identification module, for according to default rule, identifying whether the rendering task belongs to and having verified that account number, if It is not to feed back to the subscription client, if it is, create task form and preserve to the memory module, while by institute State rendering task and be sent to the processing module;
The processing module is used to generate the rendering task distribution according to the operation state data of the rendering server Daily record, and preserve to the memory module;
The distribution module is used to the rendering task is distributed to the matching according to rendering task distribution daily record Secondary transfer server;
The memory module is used to store the task form and rendering task distribution daily record.
In the present invention, main transfer server can be connected with more producer's computers by network, with management server, All secondary transfer servers are connected at a high speed using LAN.Reception/passback module provides an end being connected with external computer Mouthful, it is responsible for receiving the rendering task file uploaded from producer's computer, and assignment file is transmitted to identification module;Identify mould According to default rule, in identification, whether transmitting file belongs to root tuber has verified that account number, if not then feeding back to source computer;If It is then to create task form (time, source computer or carrys out source network IP, task priority, task size, task frame number etc. Information), preserve to memory module, continue task being sent to processing module;Processing module is according to each rendering server dynamic number According to generation task distribution daily record, preserve to memory module;Distribution module distributes daily record according to task and is distributed to task corresponding time Level transfer server;Secondary transfer server carries out task distribution according to task priority and required service device quantity;Store mould Block is used for store tasks list and distribution daily record.
Fig. 3 show the schematic diagram of management server, as shown in figure 3, the management server includes data acquisition module 410th, data memory module 420 and trigger module 430, wherein,
The data acquisition module is used for the operation state data for gathering the rendering server, forms operation list The data memory module is stored in, is additionally operable to that exception information is issued the trigger module when there are abnormal conditions;
The trigger module includes abnormal data model library, and the trigger module is searched in the abnormal data model library The exception information of the rendering server, triggers reparation corresponding with the exception information or feedback behavior, and will operation It is recorded in rendering server log list, preserves to the data memory module.
To improve the efficient management to abnormal rendering server, it is necessary to which management server examines rendering server automatically Survey, repair, feed back to administrator in time.Management server can directly be connected with more administrator's computer networks, be taken with main transfer Device, all secondary transfer server LAN high speeds of being engaged in are connected.The dynamic data of Application Monitoring is same on each rendering server When send time transfer server and management server to, ensure that management server is identical with the monitoring data of main transfer server. Each rendering server dynamic monitoring data of data collecting module collected, including rendering server address, the time, calculating task number, Duration, CPU usage, memory usage, network state, operating status etc. are calculated, list is formed and is stored in memory module, if going out Exception information is issued trigger module by existing abnormal conditions.Trigger module includes abnormal data model library, and administrator can be to the mould Type storehouse such as is increased, is changed, being deleted at the operation.According to model library preset rules, trigger module believes the exception of rendering server Breath, into line search, triggers corresponding reparation or feedback behavior in model library, and by operation note in rendering server log list In, preserve to memory module.If there is new exception information, abnormal data model library is updated in time.
Model library exception information includes but is not limited in trigger module:
If the exception information is rendering server software anomaly, task stops, then the trigger module detects automatically Other matched rendering servers continue to render, and restart abnormal rendering server, and operation note is rendered in described Server log list;
If the exception information is in off-line state for rendering server, no task renders, then the trigger module weight Open the rendering server, and by operation note in the rendering server log list, if restart it is invalid, send described in carry Information of waking up is to the management client;
If the exception information is overflowed for rendering server memory, task stops, then the trigger module detects automatically Other matched rendering servers continue to render, and restart abnormal rendering server, and operation note is rendered clothes in described Business device log list, and the prompting message is sent to the management client;
If the exception information is rendering server network interruption, can not connect, then the trigger module is sent automatically The prompting message is to the management client, by operation note in the rendering server log list;
If the exception information frequently occurs same abnormal conditions for rendering server, the trigger module is sent out automatically The prompting message is sent to the management client, by operation note in the rendering server log list.
In the present invention, client has cell phone application or short message prompting function.Administrator by checking by that can remind letter Breath, is added or changes operation, sophisticated model storehouse content to the Exception Model storehouse in management server.Administrator can pass through Arrange parameter (than if desired for checking in certain period rendering node off-line case under secondary transfer server), calls the parameter Abnormal data, and intuitively graphical representation is carried out, rational allocation is carried out to rendering server according to display data, replaces exception The operations such as server, increase server local hard disk, are more convenient, more targetedly optimization renders farm, raising rendering efficiency.
Fig. 4 show the rendering server memory of in August, 2017 under secondary transfer server A and overflows number statistical chart.Management Member is by calling statistics in daily record, output parameter:It is the affiliated rendering server of secondary transfer server A, in August, 2017, interior Deposit spilling number and obtain chart as shown in Figure 4, administrator can check A003 rendering servers memory and all according to data Business situation, and Extended RAM or modification task allocation information in time.Administrator can also be according to display data to rendering service Device carries out rational allocation, replaces the operations such as abnormal server, increase server local hard disk, is more convenient, is more targetedly excellent Change renders farm, improves rendering efficiency.
Fig. 5 show the flow of the automatic monitoring restorative procedure for being used for cloud rendering system of one embodiment of the invention offer Figure, as shown in figure 5, provided by the present invention for the automatic monitoring restorative procedure of cloud rendering system, comprises the following steps:
Step S1:Producer sets task parameters by subscription client, and required rendering task is uploaded in master Turn server;
Step S2:The account registration information of the rendering task is uploaded by the main transfer server verification, and is being tested Card automatically generates mission number after passing through, and the rendering task is distributed to matched secondary transfer server, and generate and render Task distributes daily record;
Specifically, the step S2 includes:
Step S21:The rendering task from the subscription client is received via reception/passback module, and by institute State rendering task and be transmitted to identification module;
Step S22:Via the identification module according to default rule, identify whether the rendering task belongs to and have verified that Account number, if not then feed back to the subscription client, if it is, creating task form and preserving to memory module, at the same time The rendering task is sent to processing module;
Step S23:Rendered via the processing module according to the generation of the operation state data of the rendering server Task distributes daily record, and preserves to the memory module;
Step S24:Distributed via distribution module according to the rendering task described in the rendering task is distributed to by daily record Matched secondary transfer server.
Step S3:The rendering server sends operation state data to the corresponding secondary transfer server, described Secondary transfer server is by the operation state data sending to management server and the main transfer server;
Step S4:The rendering task is distributed to matching by the secondary transfer server according to the operation state data Rendering server rendered, generate secondary rendering task distribution daily record;
Step S5:The rendering server performs the rendering task, and after the completion of the rendering task, will perform knot Fruit is sent to the subscription client via the corresponding secondary transfer server, the main transfer server;
Step S6:The management server examines the rendering server according to the operation state data automatically Survey, repair, and prompting message is sent to management client;
Specifically, the step S6 includes:
Step S61:Via the operation state data of rendering server described in data collecting module collected, operation is formed List is stored in data memory module, and exception information is issued trigger module when there are abnormal conditions;
Step S62:The trigger module searches the described different of the rendering server in the abnormal data model library Normal information, triggers reparation corresponding with the exception information or feedback behavior, and operation note is arranged in rendering server daily record In table, preserve to the data memory module.
Step S7:The management client checks prompting message, and the exception information in the management server is carried out Correct.
Provided by the present invention for the automatic monitoring repair system and method for cloud rendering system, cloud rendering system is relied on, it is right Rendering server is monitored in real time, carries out high efficiency smart management to thousands of rendering server, and to monitoring data into Row analysis, by more intuitive mode display data, carries out rendering server rational allocation, replaces abnormal server, increase Server local hard disk etc. operates, and is more convenient, more targetedly optimization renders farm, raising rendering efficiency.
The above disclosed power for being only a kind of preferred embodiment of the present invention, the present invention cannot being limited with this certainly Sharp scope, one of ordinary skill in the art will appreciate that realizing all or part of flow of above-described embodiment, and is weighed according to the present invention Profit requires made equivalent variations, still falls within and invents covered scope.

Claims (10)

1. a kind of automatic monitoring repair system for cloud rendering system, it is characterised in that including subscription client, managing customer End, main transfer server, management server, secondary transfer server and rendering server, wherein,
The subscription client is used for producer and sets task parameters, and required rendering task is uploaded to the main transfer and is taken Business device;
The main transfer server is used to verify the account registration information for uploading the rendering task, and automatic after being verified Mission number is generated, the rendering task is distributed to matched secondary transfer server, and generate rendering task distribution daily record;
The secondary transfer server is used for the operation state data for receiving affiliated rendering server, according to the operation state number Rendered according to the rendering task is distributed to matched rendering server, generate secondary rendering task distribution daily record, also use In by the operation state data sending to the management server and the main transfer server;
The rendering server is used to perform the rendering task, and after the completion of the rendering task, by implementing result via The corresponding secondary transfer server, the main transfer server are sent to the subscription client;
The management server is used to the rendering server is carried out according to the operation state data to detect automatically, repairs, And send prompting message to the management client;
The management client is used for by checking prompting message, and the exception information in the management server is modified.
2. the automatic monitoring repair system according to claim 1 for cloud rendering system, it is characterised in that described to render Task distribution daily record includes task source user client id, client registers account number, task number, the first distribution time, matching Secondary transfer server numbering, it is described secondary rendering task distribution daily record include the task number, the second distribution time, matching Rendering server numbering.
3. the automatic monitoring repair system according to claim 1 for cloud rendering system, it is characterised in that in the master Turning server includes reception/passback module, identification module, monitoring module, processing module, memory module and distribution module, wherein,
The reception/passback module, appoints for receiving the rendering task from the subscription client, and by described render Business is transmitted to the identification module;
The identification module is used for according to default rule, identifies whether the rendering task belongs to and has verified that account number, if not The subscription client is then fed back to, if it is, create task form and preserve to the memory module, while by the wash with watercolours Dye task is sent to the processing module;
The processing module is used to generate the rendering task distribution daily record according to the operation state data of the rendering server, And preserve to the memory module;
The distribution module is used to be distributed to the rendering task described matched time according to rendering task distribution daily record Level transfer server;
The memory module is used to store the task form and rendering task distribution daily record.
4. the automatic monitoring repair system according to claim 1 for cloud rendering system, it is characterised in that the management Server includes data acquisition module, data memory module and trigger module, wherein,
The data acquisition module is used for the operation state data for gathering the rendering server, forms operation list and preserves In the data memory module, it is additionally operable to that exception information is issued the trigger module when there are abnormal conditions;
The trigger module includes abnormal data model library, described in the trigger module is searched in the abnormal data model library The exception information of rendering server, triggers reparation corresponding with the exception information or feeds back behavior, and by operation note In rendering server log list, preserve to the data memory module.
5. the automatic monitoring repair system according to claim 4 for cloud rendering system, it is characterised in that if described Exception information is rendering server software anomaly, and task stops, then the trigger module detects other and matched renders clothes automatically Business device continues to render, and restarts abnormal rendering server, by operation note in the rendering server log list;
If the exception information is in off-line state for rendering server, no task renders, then the trigger module restarts institute Rendering server is stated, and by operation note in the rendering server log list, if restarting invalid, send and described remind letter Cease to the management client;
If the exception information is overflowed for rendering server memory, task stops, then the trigger module detects other automatically Matched rendering server continues to render, and restarts abnormal rendering server, by operation note in the rendering server Log list, and the prompting message is sent to the management client;
If the exception information is rendering server network interruption, it can not connect, then described in the automatic transmission of the trigger module Prompting message is to the management client, by operation note in the rendering server log list;
If the exception information frequently occurs same abnormal conditions for rendering server, the trigger module sends institute automatically Prompting message is stated to the management client, by operation note in the rendering server log list.
6. a kind of automatic monitoring restorative procedure for cloud rendering system, it is characterised in that comprise the following steps:
Step S1:Producer sets task parameters by subscription client, and required rendering task is uploaded to main transfer and is taken Business device;
Step S2:The account registration information of the rendering task is uploaded by the main transfer server verification, and it is logical in verification Later mission number is automatically generated, the rendering task is distributed to matched secondary transfer server, and generate rendering task Distribute daily record;
Step S3:The rendering server sends operation state data, the secondary to the corresponding secondary transfer server Transfer server is by the operation state data sending to management server and the main transfer server;
Step S4:The rendering task is distributed to matched wash with watercolours by the secondary transfer server according to the operation state data Dye server is rendered, and generates secondary rendering task distribution daily record;
Step S5:The rendering server performs the rendering task, and after the completion of the rendering task, implementing result is passed through Sent by the corresponding secondary transfer server, the main transfer server to the subscription client;
Step S6:The management server detects the rendering server according to the operation state data, is repaiied automatically It is multiple, and send prompting message to management client;
Step S7:The management client checks prompting message, and the exception information in the management server is modified.
7. the automatic monitoring restorative procedure according to claim 6 for cloud rendering system, it is characterised in that described to render Task distribution daily record includes task source user client id, client registers account number, task number, the first distribution time, matching Secondary transfer server numbering, it is described secondary rendering task distribution daily record include the task number, the second distribution time, matching Rendering server numbering.
8. the automatic monitoring restorative procedure according to claim 6 for cloud rendering system, it is characterised in that the step S2 includes:
Step S21:The rendering task from the subscription client is received via reception/passback module, and by the wash with watercolours Dye task is transmitted to identification module;
Step S22:Via the identification module according to default rule, identify whether the rendering task belongs to and have verified that account Number, if not then feeding back to the subscription client, if it is, creating task form and preserving to memory module, at the same time will The rendering task is sent to processing module;
Step S23:Via the processing module rendering task is generated according to the operation state data of the rendering server Daily record is distributed, and is preserved to the memory module;
Step S24:Daily record is distributed according to the rendering task rendering task is distributed to the matching via distribution module Secondary transfer server.
9. the automatic monitoring restorative procedure according to claim 6 for cloud rendering system, it is characterised in that the step S6 includes:
Step S61:Via the operation state data of rendering server described in data collecting module collected, operation list is formed Data memory module is stored in, and exception information is issued into trigger module when there are abnormal conditions;
Step S62:The trigger module searches the abnormal letter of the rendering server in the abnormal data model library Breath, triggers reparation corresponding with the exception information or feedback behavior, and by operation note in rendering server log list, Preserve to the data memory module.
10. the automatic monitoring restorative procedure according to claim 9 for cloud rendering system, it is characterised in that if institute It is rendering server software anomaly to state exception information, and task stops, then the trigger module detects other and matched renders automatically Server continues to render, and restarts abnormal rendering server, by operation note in the rendering server log list;
If the exception information is in off-line state for rendering server, no task renders, then the trigger module restarts institute Rendering server is stated, and by operation note in the rendering server log list, if restarting invalid, send and described remind letter Cease to the management client;
If the exception information is overflowed for rendering server memory, task stops, then the trigger module detects other automatically Matched rendering server continues to render, and restarts abnormal rendering server, by operation note in the rendering server Log list, and the prompting message is sent to the management client;
If the exception information is rendering server network interruption, it can not connect, then described in the automatic transmission of the trigger module Prompting message is to the management client, by operation note in the rendering server log list;
If the exception information frequently occurs same abnormal conditions for rendering server, the trigger module sends institute automatically Prompting message is stated to the management client, by operation note in the rendering server log list.
CN201711165385.7A 2017-11-21 2017-11-21 Automatic monitoring and repairing system and method for cloud rendering system Active CN107992392B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711165385.7A CN107992392B (en) 2017-11-21 2017-11-21 Automatic monitoring and repairing system and method for cloud rendering system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711165385.7A CN107992392B (en) 2017-11-21 2017-11-21 Automatic monitoring and repairing system and method for cloud rendering system

Publications (2)

Publication Number Publication Date
CN107992392A true CN107992392A (en) 2018-05-04
CN107992392B CN107992392B (en) 2021-03-23

Family

ID=62031870

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711165385.7A Active CN107992392B (en) 2017-11-21 2017-11-21 Automatic monitoring and repairing system and method for cloud rendering system

Country Status (1)

Country Link
CN (1) CN107992392B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111028124A (en) * 2019-11-29 2020-04-17 安徽赛诚云渲网络科技有限公司 Rendering system
CN111488542A (en) * 2019-01-29 2020-08-04 上海哔哩哔哩科技有限公司 Webpage output method, device, system and storage medium
CN111563027A (en) * 2020-04-30 2020-08-21 北京视博云信息技术有限公司 Application operation monitoring method, device and system
CN112118463A (en) * 2019-06-21 2020-12-22 广州虎牙科技有限公司 Information processing method, cloud platform and information processing system
CN114490097A (en) * 2022-01-12 2022-05-13 北京易智时代数字科技有限公司 Management system for rendering service and VR display system
WO2022222403A1 (en) * 2021-04-21 2022-10-27 上海商汤科技开发有限公司 Task distribution system, method, and apparatus, computer device, and storage medium
CN115865518A (en) * 2023-01-30 2023-03-28 天云融创数据科技(北京)有限公司 Cloud platform data processing method and system based on big data
CN116828215A (en) * 2023-08-30 2023-09-29 湖南马栏山视频先进技术研究院有限公司 Video rendering method and system for reducing local computing power load

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103268220A (en) * 2012-02-24 2013-08-28 苏州蓝海彤翔系统科技有限公司 Software architecture suitable for large-scale animation rendering service cloud platform
CN103442036A (en) * 2013-08-09 2013-12-11 苏州蓝海彤翔系统科技有限公司 System integrating design development, post production and data storage and based on cloud platform
CN103874989A (en) * 2011-11-07 2014-06-18 史克威尔·艾尼克斯控股公司 Rendering server, central server, encoding device, control method, encoding method, program, and recording medium
CN105071969A (en) * 2015-08-19 2015-11-18 焦点科技股份有限公司 JMX (Java Management Extensions)-based customization real-time monitoring and automatic exception handling system and method
CN105446810A (en) * 2015-12-24 2016-03-30 赞奇科技发展有限公司 Cost based multi-farm cloud rendering task distributing system and method
CN106127844A (en) * 2016-06-22 2016-11-16 民政部零研究所 Mobile phone users real-time, interactive access long-range 3D scene render exchange method
TWI579709B (en) * 2015-11-05 2017-04-21 Chunghwa Telecom Co Ltd Instantly analyze the scene file and automatically fill the cloud of the cloud system and methods

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103874989A (en) * 2011-11-07 2014-06-18 史克威尔·艾尼克斯控股公司 Rendering server, central server, encoding device, control method, encoding method, program, and recording medium
CN103268220A (en) * 2012-02-24 2013-08-28 苏州蓝海彤翔系统科技有限公司 Software architecture suitable for large-scale animation rendering service cloud platform
CN103442036A (en) * 2013-08-09 2013-12-11 苏州蓝海彤翔系统科技有限公司 System integrating design development, post production and data storage and based on cloud platform
CN105071969A (en) * 2015-08-19 2015-11-18 焦点科技股份有限公司 JMX (Java Management Extensions)-based customization real-time monitoring and automatic exception handling system and method
TWI579709B (en) * 2015-11-05 2017-04-21 Chunghwa Telecom Co Ltd Instantly analyze the scene file and automatically fill the cloud of the cloud system and methods
CN105446810A (en) * 2015-12-24 2016-03-30 赞奇科技发展有限公司 Cost based multi-farm cloud rendering task distributing system and method
CN106127844A (en) * 2016-06-22 2016-11-16 民政部零研究所 Mobile phone users real-time, interactive access long-range 3D scene render exchange method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
廖宏建等: "基于云计算的动漫渲染实验平台研究与实现", 《实验室研究与探索》 *
董陆阳: "基于层次化调度策略的渲染作业管理系统的研究与实现", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *
蔡靖: "动漫平台集群渲染系统的研究与实现", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111488542A (en) * 2019-01-29 2020-08-04 上海哔哩哔哩科技有限公司 Webpage output method, device, system and storage medium
CN111488542B (en) * 2019-01-29 2023-09-26 上海哔哩哔哩科技有限公司 Webpage output method, device, system and storage medium
CN112118463A (en) * 2019-06-21 2020-12-22 广州虎牙科技有限公司 Information processing method, cloud platform and information processing system
CN111028124A (en) * 2019-11-29 2020-04-17 安徽赛诚云渲网络科技有限公司 Rendering system
CN111563027A (en) * 2020-04-30 2020-08-21 北京视博云信息技术有限公司 Application operation monitoring method, device and system
CN111563027B (en) * 2020-04-30 2023-09-01 北京视博云信息技术有限公司 Application operation monitoring method, device and system
WO2022222403A1 (en) * 2021-04-21 2022-10-27 上海商汤科技开发有限公司 Task distribution system, method, and apparatus, computer device, and storage medium
CN114490097A (en) * 2022-01-12 2022-05-13 北京易智时代数字科技有限公司 Management system for rendering service and VR display system
CN115865518A (en) * 2023-01-30 2023-03-28 天云融创数据科技(北京)有限公司 Cloud platform data processing method and system based on big data
CN115865518B (en) * 2023-01-30 2023-05-16 天云融创数据科技(北京)有限公司 Cloud platform data processing method and system based on big data
CN116828215A (en) * 2023-08-30 2023-09-29 湖南马栏山视频先进技术研究院有限公司 Video rendering method and system for reducing local computing power load
CN116828215B (en) * 2023-08-30 2023-11-14 湖南马栏山视频先进技术研究院有限公司 Video rendering method and system for reducing local computing power load

Also Published As

Publication number Publication date
CN107992392B (en) 2021-03-23

Similar Documents

Publication Publication Date Title
CN107992392A (en) A kind of automatic monitoring repair system and method for cloud rendering system
CN106844198B (en) Distributed dispatching automation test platform and method
CN107291565B (en) Operation and maintenance visual automatic operation platform and implementation method
CN102571396B (en) Communication network system and routing inspection subsystem and routing inspection method of communication equipment
CN102508709B (en) Distributed-cache-based acquisition task scheduling method in purchase, supply and selling integrated electric energy acquiring and monitoring system
CN108322345A (en) A kind of dissemination method and server of fault restoration data packet
CN107508722B (en) Service monitoring method and device
US20220052923A1 (en) Data processing method and device, storage medium and electronic device
CN112600891A (en) Edge cloud cooperation system based on information physical fusion and working method
CN105872068A (en) Cloud platform and automatic operation check method based on same
CN108845798A (en) A kind of visualization big data task cradle and processing method
CN111158708A (en) Task arrangement engine system
US20070226231A1 (en) Systems and methods for managing business issues
CN105447681A (en) Physicochemical detection control and information management system
CN110968479B (en) Service level full-link monitoring method and server for application program
CN114996006A (en) Server arrangement configuration execution method, device, equipment and medium
CN101860564A (en) Protocol-based service composition system and method
CN110011827A (en) Towards doctor conjuncted multi-user's big data analysis service system and method
US12436812B1 (en) Systems and methods to facilitate adaptive resource capacity prediction and control using cloud infrastructures with a capacity prediction interface
US12277447B1 (en) Systems and methods to facilitate adaptive resource capacity prediction and control using cloud infrastructures
CN111324460A (en) Power monitoring control system and method based on cloud computing platform
CN109639490A (en) A kind of delay machine notification method and device
CN113824801A (en) A unified access management component system for intelligent fusion terminals
CN113312174A (en) Information query method and device, electronic equipment and container management system
CN112990744A (en) Automatic operation and maintenance method and device for massive million-level cloud equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载