CN106817408B - Distributed server cluster scheduling method and device - Google Patents
Distributed server cluster scheduling method and device Download PDFInfo
- Publication number
- CN106817408B CN106817408B CN201611229087.5A CN201611229087A CN106817408B CN 106817408 B CN106817408 B CN 106817408B CN 201611229087 A CN201611229087 A CN 201611229087A CN 106817408 B CN106817408 B CN 106817408B
- Authority
- CN
- China
- Prior art keywords
- server
- slave
- client
- servers
- called
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/104—Peer-to-peer [P2P] networks
- H04L67/1044—Group management mechanisms
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/0654—Management of faults, events, alarms or notifications using network fault recovery
- H04L41/0668—Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/10—Active monitoring, e.g. heartbeat, ping or trace-route
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/16—Threshold monitoring
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1095—Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/60—Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Cardiology (AREA)
- General Health & Medical Sciences (AREA)
- Hardware Redundancy (AREA)
- Multi Processors (AREA)
Abstract
本发明公开了一种分布式服务器集群调度方法及装置,涉及运营平台管理技术领域,方法包括:主服务器接收网络服务器经消息队列服务器发送的执行消息,其中,执行消息中包括各客户端服务器对应的待处理任务;主服务器获取从服务器集群中每个从服务器的状态,从服务器集群中包括至少两个从服务器;主服务器根据待处理任务的客户端服务器的数量、从服务器调用客户端服务器的最大阈值以及每个从服务器的状态,确定需要调用的从服务器;主服务器将执行消息中的各客户端服务器对应的待处理任务分发至需要调用的从服务器。本发明实施例通过调用服务器集群中的从服器,将任务快速下发给任务对应的客户端执行任务,使得客户端的利用最大化,并实现了高可用。
The invention discloses a distributed server cluster scheduling method and device, and relates to the technical field of operation platform management. The method includes: a main server receives an execution message sent by a network server via a message queue server, wherein the execution message includes the correspondence between each client server and the server. The master server obtains the status of each slave server in the slave server cluster, and the slave server cluster includes at least two slave servers; the master server calls the client server’s The maximum threshold and the state of each slave server determine the slave server to be called; the master server distributes the pending tasks corresponding to each client server in the execution message to the slave server to be called. In the embodiment of the present invention, by calling the slave server in the server cluster, the task is quickly delivered to the client corresponding to the task to execute the task, so that the utilization of the client is maximized and high availability is achieved.
Description
技术领域technical field
本发明涉及运营平台管理技术领域,尤其涉及一种分布式服务器集群调度方法及装置。The invention relates to the technical field of operation platform management, and in particular, to a method and device for scheduling distributed server clusters.
背景技术Background technique
分布式数据库系统已经成为信息处理学科的重要领域,正在迅速发展之中,具有如下有点:1、它可以解决组织机构分散而数据需要相互联系的问题。比如银行系统,总行与各分行处于不同的城市或城市中的各个地区,在业务上它们需要处理各自的数据,也需要彼此之间的交换和处理,这就需要分布式的系统。2、如果一个组织机构需要增加新的相对自主的组织单位来扩充机构,则分布式数据库系统可以在对当前机构影响最小的情况下进行扩充。3、均衡负载的需要。数据的分解采用使局部应用达到最大,这使得各处理机之间的相互干扰降到最低。负载在各处理机之间分担,可以避免临界瓶颈。4、当现有机构中已存在几个数据库系统,而且实现全局应用的必要性增加时,就可以由这些数据库自下而上构成分布式数据库系统。5、相等规模的分布式数据库系统在出现故障的几率上不会比集中式数据库系统低,但由于其故障的影响仅限于局部数据应用,因此就整个系统来讲它的可靠性是比较高的。Distributed database system has become an important field of information processing and is developing rapidly. It has the following advantages: 1. It can solve the problem that organizations are scattered and data needs to be interconnected. For example, in a banking system, the head office and branches are located in different cities or in various regions of the city. In terms of business, they need to process their own data, and also need to exchange and process each other, which requires a distributed system. 2. If an organization needs to add new relatively autonomous organizational units to expand the organization, the distributed database system can be expanded with the least impact on the current organization. 3. The need for load balancing. The decomposition of data is used to maximize local application, which minimizes the mutual interference between processors. The load is shared among processors to avoid critical bottlenecks. 4. When there are several database systems in the existing organization, and the necessity of realizing the global application increases, the distributed database system can be composed of these databases from the bottom up. 5. The probability of failure of a distributed database system of equal scale is not lower than that of a centralized database system, but since the impact of its failure is limited to local data applications, its reliability is relatively high in terms of the entire system .
随着企业业务的快速增长,服务器的数量以及对服务器的维护需求与日俱增,但是现有技术中使用单节点服务器作为任务下发平台,缺少高可用机制、任务执行缓慢、无法及时高效管理计算机机群。With the rapid growth of enterprise business, the number of servers and the maintenance requirements for servers are increasing day by day. However, in the prior art, a single-node server is used as a task distribution platform, which lacks a high-availability mechanism, slows task execution, and cannot manage computer clusters in a timely and efficient manner.
因此,综上所述,现有技术中不能提供一种有效且高效下发任务的方法。Therefore, to sum up, the prior art cannot provide an effective and efficient method for assigning tasks.
发明内容SUMMARY OF THE INVENTION
本发明提供一种分布式服务器集群调度方法及装置,用于解决现有技术中不能提供一种有效且高效下发任务的方法的问题。The present invention provides a distributed server cluster scheduling method and device, which are used to solve the problem that an effective and efficient task dispatching method cannot be provided in the prior art.
本发明实施例提供一种分布式服务器集群调度方法,所述方法包括:An embodiment of the present invention provides a distributed server cluster scheduling method, and the method includes:
主服务器接收网络服务器经消息队列服务器发送的执行消息,其中,所述执行消息中包括各客户端服务器对应的待处理任务;The main server receives the execution message sent by the network server via the message queue server, wherein the execution message includes tasks to be processed corresponding to each client server;
所述主服务器获取从服务器集群中每个从服务器的状态,其中,所述从服务器集群中包括至少两个从服务器;The master server obtains the status of each slave server in the slave server cluster, wherein the slave server cluster includes at least two slave servers;
所述主服务器根据待处理任务的客户端服务器的数量、所述从服务器调用客户端服务器的最大阈值以及每个所述从服务器的状态,确定需要调用的从服务器;The master server determines the slave server to be called according to the number of client servers of the task to be processed, the maximum threshold for calling the client server from the slave server, and the state of each slave server;
所述主服务器将所述执行消息中的各客户端服务器对应的待处理任务分发至所述需要调用的从服务器。The master server distributes the tasks to be processed corresponding to each client server in the execution message to the slave servers that need to be called.
本发明实施例中,主服务器在接收到待处理任务后,根据从服务器集群中的从服务器的状态,任务数量、每个从服务器能够调用的客户端的最大阈值、每个从服务器现在的运行状态,确定需要调用的从服务器,并将任务下发给这些从服务器。本发明实施例中,通过调用服务器集群中的从服器,能将任务快速下发给任务对应的客户端执行任务,使得客户端的利用最大化,并能够在从服务器集群中有从服务器发生故障时,还能够顺利的将任务下发给客户端,实现了高可用。In this embodiment of the present invention, after the master server receives the tasks to be processed, according to the status of the slave servers in the slave server cluster, the number of tasks, the maximum threshold of clients that each slave server can call, and the current running state of each slave server , determine the slave servers that need to be called, and issue the task to these slave servers. In the embodiment of the present invention, by calling the slave server in the server cluster, the task can be quickly dispatched to the client corresponding to the task to execute the task, so that the utilization of the client is maximized, and the slave server in the slave server cluster can fail. At the same time, the task can also be smoothly delivered to the client, achieving high availability.
进一步地,所述主服务器将所述执行消息中的各客户端服务器对应的待处理任务分发至所述需要调用的从服务器,包括:Further, the master server distributes the pending tasks corresponding to each client server in the execution message to the slave servers that need to be called, including:
所述主服务器将所述执行消息中的各客户端服务器对应的待处理任务切分为各所述需要调用的从服务器对应的任务列表,并将所述任务列表发送给对应的需要调用的从服务器;The master server divides the tasks to be processed corresponding to the client servers in the execution message into task lists corresponding to the slave servers that need to be called, and sends the task list to the corresponding slave servers that need to be called. server;
所述主服务器将所述执行消息中的各客户端服务器对应的待处理任务分发至所述需要调用的从服务器后,还包括:After the master server distributes the pending tasks corresponding to each client server in the execution message to the slave servers to be called, the method further includes:
针对每个需要调用的从服务器,所述需要调用的从服务器根据接收的任务列表调用对应的客户端服务器;所述需要调用的从服务器接收所述客户端服务器的任务处理结果,并将所述任务处理结果通过所述消息队列服务器发送给所述网络服务器。For each slave server to be called, the slave server to be called calls the corresponding client server according to the received task list; the slave server to be called receives the task processing result of the client server, and the The task processing result is sent to the network server through the message queue server.
本发明实施例中,主服务器将获取到的任务下发给调用任务对应客户端的从服务器,以使从服务器能够确定调用哪些客户端执行任务。In the embodiment of the present invention, the master server issues the acquired task to the slave server of the client corresponding to the calling task, so that the slave server can determine which clients to call to execute the task.
进一步地,所述主服务器获取从服务器集群中每个从服务器的状态,包括:Further, the master server obtains the status of each slave server in the slave server cluster, including:
管理服务器获取从服务器集群中每个从服务器的状态;The management server obtains the status of each slave server in the slave server cluster;
所述管理服务器将每个从服务器的状态同步到至少一个备用管理服务器中;The management server synchronizes the state of each slave server to at least one standby management server;
所述主服务器确定所述管理服务器的运行状态是否正常;The main server determines whether the running state of the management server is normal;
若所述主服务器确定所述管理服务器状态异常,则调用所述至少一个备用管理服务器中的任意一个,获取所述从服务器集群中每个从服务器的状态。If the master server determines that the state of the management server is abnormal, call any one of the at least one standby management server to obtain the state of each slave server in the slave server cluster.
本发明实施例中,管理服务器能够将每个从服务器的状态同步到备用管理服务器中,以便在管理服务器发生故障时,还可以及时获取到每个从服务器的状态,实现了高可用。In the embodiment of the present invention, the management server can synchronize the status of each slave server to the standby management server, so that when the management server fails, the status of each slave server can be obtained in time, thereby realizing high availability.
进一步地,所述管理服务器获取从服务器集群中每个从服务器的状态,包括:Further, the management server obtains the status of each slave server in the slave server cluster, including:
针对从服务器集群中的任意一个从服务器,若所述管理服务器在设定时间内不能获取到所述从服务器的心跳探测信息,则确定所述从服务器的状态异常。For any slave server in the slave server cluster, if the management server cannot obtain the heartbeat detection information of the slave server within a set time, it is determined that the state of the slave server is abnormal.
本发明实施例中,每个从服务器都通过心跳探测信息与管理服务器连接,若管理服务器在设定时间内不能接收到某个从服务器的心跳探测消息,则认为该从服务器不能使用,则在主服务器查询状态时,不再提供该从服务器的状态信息。In the embodiment of the present invention, each slave server is connected to the management server through heartbeat detection information. If the management server cannot receive the heartbeat detection message of a certain slave server within a set time, it is considered that the slave server cannot be used, and then the When the master server queries the status, the slave server's status information is no longer available.
进一步地,所述方法还包括:Further, the method also includes:
若所述主服务器确定所述从服务器集群中每个从服务器的使用率都等于所述最大阈值时,则新增从服务器,所述新增从服务器注册于所述管理服务器中。If the master server determines that the usage rate of each slave server in the slave server cluster is equal to the maximum threshold, it adds a new slave server, and the newly added slave server is registered in the management server.
本发明实施例中,当确定现有的从服务器集群已不能满足任务处理的要求,则动态增加从服务器,实现了可扩展性。In the embodiment of the present invention, when it is determined that the existing slave server cluster can no longer meet the requirements of task processing, the slave server is dynamically added to achieve scalability.
本发明还提供一种分布式服务器集群调度装置,包括:The present invention also provides a distributed server cluster scheduling device, comprising:
接收单元,用于接收网络服务器经消息队列服务器发送的执行消息,其中,所述执行消息中包括各客户端服务器对应的待处理任务;a receiving unit, configured to receive an execution message sent by the network server via the message queue server, wherein the execution message includes tasks to be processed corresponding to each client server;
获取单元,用于获取从服务器集群中每个从服务器的状态,其中,所述从服务器集群中包括至少两个从服务器;an obtaining unit, configured to obtain the status of each slave server in the slave server cluster, wherein the slave server cluster includes at least two slave servers;
确定单元,用于根据待处理任务的客户端服务器的数量、所述从服务器调用客户端服务器的最大阈值以及每个所述从服务器的状态,确定需要调用的从服务器;a determining unit, configured to determine the slave server to be called according to the number of client servers of the task to be processed, the maximum threshold for calling the client server from the slave server, and the state of each of the slave servers;
发送单元,用于将所述执行消息中的各客户端服务器对应的待处理任务分发至所述需要调用的从服务器。A sending unit, configured to distribute the tasks to be processed corresponding to each client server in the execution message to the slave server that needs to be called.
本发明实施例中,通过调用服务器集群中的从服器,能将任务快速下发给任务对应的客户端执行任务,使得客户端的利用最大化,并能够在从服务器集群中有从服务器发生故障时,还能够顺利的将任务下发给客户端,实现了高可用。In the embodiment of the present invention, by calling the slave server in the server cluster, the task can be quickly dispatched to the client corresponding to the task to execute the task, so that the utilization of the client is maximized, and the slave server in the slave server cluster can fail. At the same time, the task can also be smoothly delivered to the client, achieving high availability.
进一步地,所述确定单元,具体用于:Further, the determining unit is specifically used for:
将所述执行消息中的各客户端服务器对应的待处理任务切分为各所述需要调用的从服务器对应的任务列表,并将所述任务列表发送给对应的需要调用的从服务器;Divide the tasks to be processed corresponding to the client servers in the execution message into task lists corresponding to the slave servers that need to be called, and send the task list to the corresponding slave servers that need to be called;
根据接收的任务列表调用对应的客户端服务器;接收所述客户端服务器的任务处理结果,并将所述任务处理结果通过所述消息队列服务器发送给所述网络服务器。Invoke the corresponding client server according to the received task list; receive the task processing result of the client server, and send the task processing result to the network server through the message queue server.
进一步地,所述获取单元,具体用于:Further, the acquisition unit is specifically used for:
获取从服务器集群中每个从服务器的状态;Get the status of each slave server in the slave server cluster;
将每个从服务器的状态同步到至少一个备用管理服务器中;Synchronize the state of each slave server to at least one standby management server;
确定所述管理服务器的运行状态是否正常;determining whether the running state of the management server is normal;
若确定所述管理服务器状态异常,则调用所述至少一个备用管理服务器中的任意一个,获取所述从服务器集群中每个从服务器的状态。If it is determined that the state of the management server is abnormal, any one of the at least one standby management server is called to obtain the state of each slave server in the slave server cluster.
进一步地,所述获取单元,具体用于:Further, the acquisition unit is specifically used for:
针对从服务器集群中的任意一个从服务器,若在设定时间内不能获取到所述从服务器的心跳探测信息,则确定所述从服务器的状态异常。For any slave server in the slave server cluster, if the heartbeat detection information of the slave server cannot be acquired within a set time, it is determined that the state of the slave server is abnormal.
进一步地,所述装置还包括:Further, the device also includes:
动态增加单元,用于若确定所述从服务器集群中每个从服务器的使用率都等于所述最大阈值时,则新增从服务器,所述新增从服务器注册于所述管理服务器中。A dynamic adding unit, configured to add a slave server if it is determined that the usage rate of each slave server in the slave server cluster is equal to the maximum threshold, and the newly added slave server is registered in the management server.
附图说明Description of drawings
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.
图1为本发明实施例提供的一种分布式服务器集群调度系统的结构示意图;1 is a schematic structural diagram of a distributed server cluster scheduling system according to an embodiment of the present invention;
图2为本发明实施例提供的一种消息队列服务器中消息队列的示意图;2 is a schematic diagram of a message queue in a message queue server according to an embodiment of the present invention;
图3为本发明实施例提供的一种分布式服务器集群调度装置的结构示意图;3 is a schematic structural diagram of a distributed server cluster scheduling apparatus according to an embodiment of the present invention;
图4为本发明实施例提供的一种分布式服务器集群调度方法的流程示意图。FIG. 4 is a schematic flowchart of a method for scheduling a distributed server cluster according to an embodiment of the present invention.
具体实施方式Detailed ways
为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作进一步地详细描述,显然,所描述的实施例仅仅是本发明一部份实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. . Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
本发明提供一种分布式服务器集群调度方法,如图1所示,包括:The present invention provides a distributed server cluster scheduling method, as shown in FIG. 1 , including:
主服务器101,网络服务器102,消息队列服务器103,从服务器104,客户端服务器105以及管理服务器106。
网络服务器102用于对系统进行管理,并获取信息反馈给上层或者获取上层的命令并在系统中执行命令。在本发明实施例中,网络服务器102可以制定任务,并将任务制作成任务列表,发送给客户端服务器105执行,其中任务列表中的任务于客户端服务器105有对应关系。The
例如,在本发明实施例中,网络服务器102制定的任务列表如表1所示,具体为将任务1下发给客户端1,将任务2下发给客户端2,将任务3下发给任务端3。For example, in this embodiment of the present invention, the task list formulated by the
表1:任务列表Table 1: Task List
本发明实施例中,主服务器101将任务列表进行封装,封装为一个执行消息,并发送给消息队列服务器103,在本发明实施例中,可选的,如图2所示,消息队列服务器103中按照接收消息的先后,顺序排列着多个消息,假设执行消息为按照顺序的第三个消息,则在消息队列服务器103处理完前两个消息后,则处理本发明实施例中的执行消息。In the embodiment of the present invention, the
可选的,在本发明实施例中,消息队列服务器103订阅了网络服务器102的消息,即在网络服务器102中有新增消息时,消息队列服务器103能够获取到该新增消息。Optionally, in this embodiment of the present invention, the
主服务器101接收到消息队列服务器103发送的执行消息后,根据执行消息确定执行消息中的任务列表,例如,本发明实施例中,主服务器101接收到的是上述实施例中表1中的任务列表,则主服务器101则根据任务列表确定需要调用的从服务器集群中的从服务器104。After receiving the execution message sent by the
在本发明实施例中,从服务器集群是具有大量从服务器104的服务器集群,从服务器集群中的从服务器104能够调用客户端服务器105。In this embodiment of the present invention, the slave server cluster is a server cluster with a large number of
可选的,在本发明实施例中,从服务器集群中的每一个从服务器104能够调用客户端服务器105的数量是有最大值的,可以设置为m,即一个从服务器104能够调用m个客户端服务器105。Optionally, in this embodiment of the present invention, the number of each
在本发明实施例中,主服务器101调用从服务器104之前,首先需要确定从服务器集群中每个从服务器104的状态,可选的,在本发明实施例中,从服务器104的状态包括异常状态及正常状态。In this embodiment of the present invention, before the
在本发明实施例中,主服务器101可以通过管理服务器106来获取从服务器104的状态,当建立从服务器集群时,从服务器集群中的每个从服务器104会向管理服务器106发送连接请求,管理服务器106能够根据连接请求确定从服务器104加入到了从服务器集群中。In this embodiment of the present invention, the
从服务器104与管理服务器106在建立连接后,通过心跳探测消息来确定从服务器104是否异常,可选的,若在设定时间内管理服务器106没有接收到从服务器104发送的心跳探测消息,则认为给从服务器104的状态异常。After the connection between the
可选的,在本发明实施例中,管理服务器106中保存了注册在管理服务器105中的所有从服务器104的状态信息表,该表周期性更新。例如如表2所示,管理服务器106中注册了N个从服务器104,管理服务器106中从服务器104的状态信息表更新了1次,则状态信息表中保存了每个从服务器的两个状态。Optionally, in this embodiment of the present invention, the
表2:管理服务器中的状态信息表Table 2: Status information table in the management server
可选的,在本发明实施例中,管理服务器106中还保存了两张状态信息表,一张为从服务器104状态正常的信息表,如表3所示,一张为从服务器104状态异常的信息表,如表4所示。Optionally, in this embodiment of the present invention, the
表3:管理服务器中的状态信息表Table 3: Status information table in the management server
表4:管理服务器中的状态信息表Table 4: Status information table in the management server
表3中保存的全部是最新的状态为正常的从服务器104的列表,例如表3中的从服务器1,从服务器3等;可选的,保存的是从服务器104的标识信息,可以为从服务器104的硬件地址或者为其它能够表征从服务器104的唯一性的标识。All stored in Table 3 are the latest lists of
表4保存的全部是最新的状态为异常的从服务器104的列表,例如表4中的从服务器2,从服务器4等;可选的,保存的是从服务器104的标识信息,可以为从服务器104的硬件地址或者为其它能够表征从服务器104的唯一性的标识。Table 4 stores all the latest lists of
可选的,在本发明实施例中,管理服务器106还能够获取状态为正常的从服务器104列表中每个从服务器列表104的工作状态,在本发明实施例中,工作状态指的是每一个从服务器104是否已经调用了客户端服务器105,调用了几个客户端服务器105,如表5所示,在管理服务器106中还保存了状态为正常的从服务器104列表中每个从服务器104的调用的客户端服务器105的数量。Optionally, in this embodiment of the present invention, the
表5:管理服务器中保存的从服务器的工作状态表Table 5: The working status table of the slave server saved in the management server
主服务器101可以根据表5中的工作状态表确定是否能够调用从服务器104,且还可以使用该从服务器104调用多少台从服务器,例如,在本发明实施例中,从服务器104调用客户端服务器105的最大数量为m,也就是说如表5中的从服务器4,已经调用了m台客户端服务器105,则不能再调用客户端服务器105了。从服务器1已经调用了2台客户端服务器105,则还可以调用m-2台客户端服务器105。The
可选的,在本发明实施例中,管理服务器106将获取到的从服务器104的所有信息列表进行底层数据复制,并复制给至少一个备用管理服务器106中,在主服务器101向管理服务器106获取从服务器104的信息时,需要首先判断管理服务器106是否异常,若确定管理服务器106异常,则调用其它的工作正常的任一一个备用管理服务器106来查询从服务器104的状态。Optionally, in this embodiment of the present invention, the
主服务器101根据接收到的任务列表中需要执行任务的客户端服务器105的数量,当前从服务器104的状态以及从服务器104调用客户端服务器105的最大阈值来确定在从服务器集群中调用哪些从服务器104。The
在本发明实施例中,假设需要执行任务的客户端服务器105的数量为n,查询到的从服务器集群中共有20台从服务器104,其中有4台从服务器104已调用0台,有10台从服务器104已调用m台,还有6台从服务器104已调用2台,则主服务器101首先确定需要执行任务的客户端服务器105的数量是否满足公式1:In this embodiment of the present invention, assuming that the number of
n≤(m-2)×6+4m (公式1)n≤(m-2)×6+4m (Formula 1)
若满足公式1,例如n=(m-2)×6+4m,则调用从服务集群中除10台已调用m台客户端服务器105的从服务器104以外的所有从服务器104,且4台已调用0台客户端服务器105的从服务器104每台调用m个客户端服务器105,6台已调用2台客户端服务器105的从服务器104每台调用m-2个客户端服务器105。If formula 1 is satisfied, for example, n=(m-2)×6+4m, all
主服务器101将任务列表中需要执行任务的n个客户端服务器105分发给上述中每个从服务器104,即将n个客户端服务器105的随机分配给10m-12台从服务器104中,每个从服务器104在接收到需要调用的客户端服务器105后,调用对应的客户端服务器105执行任务列表中的任务。The
在本发明实施例中,从服务器104接收调用的客户端服务器105的在处理任务后的任务处理结果,并将任务处理结果发送给消息队列服务器103,消息队列服务器103发送给网络服务器102。In this embodiment of the present invention, the task processing result of the called
可选的,在本发明实施例中,当确定不满足公式1时,可以动态增加从服务器集群中从服务器104的数量,可选的,在本发明实施例中,从数据库中调用新的从服务器104,并将新的从服务器104注册在管理服务器106中。Optionally, in this embodiment of the present invention, when it is determined that formula 1 is not satisfied, the number of
基于同样的构思,本发明实施例还提供一种分布式服务器集群调度装置,如图3所示,包括:Based on the same concept, an embodiment of the present invention also provides a distributed server cluster scheduling device, as shown in FIG. 3 , including:
接收单元301,用于接收网络服务器经消息队列服务器发送的执行消息,其中,所述执行消息中包括各客户端服务器对应的待处理任务;A receiving
获取单元302,用于获取从服务器集群中每个从服务器的状态,其中,所述从服务器集群中包括至少两个从服务器;an obtaining
确定单元303,用于根据待处理任务的客户端服务器的数量、所述从服务器调用客户端服务器的最大阈值以及每个所述从服务器的状态,确定需要调用的从服务器;Determining
发送单元304,用于将所述执行消息中的各客户端服务器对应的待处理任务分发至所述需要调用的从服务器。The sending
进一步地,所述确定单元303,具体用于:Further, the determining
将所述执行消息中的各客户端服务器对应的待处理任务切分为各所述需要调用的从服务器对应的任务列表,并将所述任务列表发送给对应的需要调用的从服务器;Divide the tasks to be processed corresponding to the client servers in the execution message into task lists corresponding to the slave servers that need to be called, and send the task list to the corresponding slave servers that need to be called;
根据接收的任务列表调用对应的客户端服务器;接收所述客户端服务器的任务处理结果,并将所述任务处理结果通过所述消息队列服务器发送给所述网络服务器。Invoke the corresponding client server according to the received task list; receive the task processing result of the client server, and send the task processing result to the network server through the message queue server.
进一步地,所述获取单元302,具体用于:Further, the obtaining
获取从服务器集群中每个从服务器的状态;Get the status of each slave server in the slave server cluster;
将每个从服务器的状态同步到至少一个备用管理服务器中;Synchronize the state of each slave server to at least one standby management server;
确定所述管理服务器的运行状态是否正常;determining whether the running state of the management server is normal;
若确定所述管理服务器状态异常,则调用所述至少一个备用管理服务器中的任意一个,获取所述从服务器集群中每个从服务器的状态。If it is determined that the state of the management server is abnormal, any one of the at least one standby management server is called to obtain the state of each slave server in the slave server cluster.
进一步地,所述获取单元302,具体用于:Further, the obtaining
针对从服务器集群中的任意一个从服务器,若在设定时间内不能获取到所述从服务器的心跳探测信息,则确定所述从服务器的状态异常。For any slave server in the slave server cluster, if the heartbeat detection information of the slave server cannot be acquired within a set time, it is determined that the state of the slave server is abnormal.
进一步地,所述装置还包括:Further, the device also includes:
动态增加单元305,用于若确定所述从服务器集群中每个从服务器的使用率都等于所述最大阈值时,则新增从服务器,所述新增从服务器注册于所述管理服务器中。The dynamic adding
基于同样的构思,本发明还提供一种分布式服务器集群调度方法,如图4所示,包括:Based on the same concept, the present invention also provides a distributed server cluster scheduling method, as shown in FIG. 4 , including:
步骤401,主服务器接收网络服务器经消息队列服务器发送的执行消息,其中,所述执行消息中包括各客户端服务器对应的待处理任务;
步骤402,所述主服务器获取从服务器集群中每个从服务器的状态,其中,所述从服务器集群中包括至少两个从服务器;
步骤403,所述主服务器根据待处理任务的客户端服务器的数量、所述从服务器调用客户端服务器的最大阈值以及每个所述从服务器的状态,确定需要调用的从服务器;
步骤404,所述主服务器将所述执行消息中的各客户端服务器对应的待处理任务分发至所述需要调用的从服务器。
进一步地,所述主服务器将所述执行消息中的各客户端服务器对应的待处理任务分发至所述需要调用的从服务器,包括:Further, the master server distributes the pending tasks corresponding to each client server in the execution message to the slave servers that need to be called, including:
所述主服务器将所述执行消息中的各客户端服务器对应的待处理任务切分为各所述需要调用的从服务器对应的任务列表,并将所述任务列表发送给对应的需要调用的从服务器;The master server divides the tasks to be processed corresponding to the client servers in the execution message into task lists corresponding to the slave servers that need to be called, and sends the task list to the corresponding slave servers that need to be called. server;
所述主服务器将所述执行消息中的各客户端服务器对应的待处理任务分发至所述需要调用的从服务器后,还包括:After the master server distributes the pending tasks corresponding to each client server in the execution message to the slave servers to be called, the method further includes:
针对每个需要调用的从服务器,所述需要调用的从服务器根据接收的任务列表调用对应的客户端服务器;所述需要调用的从服务器接收所述客户端服务器的任务处理结果,并将所述任务处理结果通过所述消息队列服务器发送给所述网络服务器。For each slave server to be called, the slave server to be called calls the corresponding client server according to the received task list; the slave server to be called receives the task processing result of the client server, and the The task processing result is sent to the network server through the message queue server.
进一步地,所述主服务器获取从服务器集群中每个从服务器的状态,包括:Further, the master server obtains the status of each slave server in the slave server cluster, including:
管理服务器获取从服务器集群中每个从服务器的状态;The management server obtains the status of each slave server in the slave server cluster;
所述管理服务器将每个从服务器的状态同步到至少一个备用管理服务器中;The management server synchronizes the state of each slave server to at least one standby management server;
所述主服务器确定所述管理服务器的运行状态是否正常;The main server determines whether the running state of the management server is normal;
若所述主服务器确定所述管理服务器状态异常,则调用所述至少一个备用管理服务器中的任意一个,获取所述从服务器集群中每个从服务器的状态。If the master server determines that the state of the management server is abnormal, call any one of the at least one standby management server to obtain the state of each slave server in the slave server cluster.
进一步地,所述管理服务器获取从服务器集群中每个从服务器的状态,包括:Further, the management server obtains the status of each slave server in the slave server cluster, including:
针对从服务器集群中的任意一个从服务器,若所述管理服务器在设定时间内不能获取到所述从服务器的心跳探测信息,则确定所述从服务器的状态异常。For any slave server in the slave server cluster, if the management server cannot obtain the heartbeat detection information of the slave server within a set time, it is determined that the state of the slave server is abnormal.
进一步地,所述方法还包括:Further, the method also includes:
若所述主服务器确定所述从服务器集群中每个从服务器的使用率都等于所述最大阈值时,则新增从服务器,所述新增从服务器注册于所述管理服务器中。If the master server determines that the usage rate of each slave server in the slave server cluster is equal to the maximum threshold, it adds a new slave server, and the newly added slave server is registered in the management server.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。Although preferred embodiments of the present invention have been described, additional changes and modifications to these embodiments may occur to those skilled in the art once the basic inventive concepts are known. Therefore, the appended claims are intended to be construed to include the preferred embodiment and all changes and modifications that fall within the scope of the present invention.
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit and scope of the invention. Thus, provided that these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include these modifications and variations.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201611229087.5A CN106817408B (en) | 2016-12-27 | 2016-12-27 | Distributed server cluster scheduling method and device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201611229087.5A CN106817408B (en) | 2016-12-27 | 2016-12-27 | Distributed server cluster scheduling method and device |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN106817408A CN106817408A (en) | 2017-06-09 |
| CN106817408B true CN106817408B (en) | 2020-09-29 |
Family
ID=59110131
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201611229087.5A Active CN106817408B (en) | 2016-12-27 | 2016-12-27 | Distributed server cluster scheduling method and device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN106817408B (en) |
Families Citing this family (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN107766136A (en) * | 2017-09-30 | 2018-03-06 | 南威软件股份有限公司 | A kind of method of task cluster management and running |
| CN109936593B (en) * | 2017-12-15 | 2022-03-01 | 网宿科技股份有限公司 | A method and system for message distribution |
| CN108282527B (en) * | 2018-01-22 | 2019-07-09 | 北京百度网讯科技有限公司 | Distributed system and method for generating service instances |
| CN108762917A (en) * | 2018-05-04 | 2018-11-06 | 平安科技(深圳)有限公司 | Access request processing method, device, system, computer equipment and storage medium |
| CN109032803B (en) | 2018-08-01 | 2021-02-12 | 创新先进技术有限公司 | Data processing method and device and client |
| CN110928673A (en) * | 2018-09-20 | 2020-03-27 | 北京国双科技有限公司 | Task allocation method and device |
| CN109660607B (en) * | 2018-12-05 | 2021-08-27 | 北京金山云网络技术有限公司 | Service request distribution method, service request receiving method, service request distribution device, service request receiving device and server cluster |
| CN109639506A (en) * | 2019-01-08 | 2019-04-16 | 北京文香信息技术有限公司 | A kind of master-slave control method, device, storage medium and server |
| CN111459903A (en) * | 2019-01-21 | 2020-07-28 | 顺丰科技有限公司 | Database management system and method |
| CN110262882A (en) * | 2019-06-17 | 2019-09-20 | 北京思特奇信息技术股份有限公司 | A kind of distributed communication command scheduling system and method |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102521044A (en) * | 2011-12-30 | 2012-06-27 | 北京拓明科技有限公司 | Distributed task scheduling method and system based on messaging middleware |
| CN102739775A (en) * | 2012-05-29 | 2012-10-17 | 宁波东冠科技有限公司 | Method for monitoring and managing Internet of Things data acquisition server cluster |
| CN103973725A (en) * | 2013-01-28 | 2014-08-06 | 阿里巴巴集团控股有限公司 | Distributed collaboration method and collaboration device |
| WO2016039963A3 (en) * | 2014-09-10 | 2016-09-01 | Ebay Inc. | Resource sharing between two resource allocation systems |
| CN105991737A (en) * | 2015-02-26 | 2016-10-05 | 阿里巴巴集团控股有限公司 | Distributed task scheduling method and system |
| CN106027634A (en) * | 2016-05-16 | 2016-10-12 | 白杨 | Baiyang message port switch service |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104184756A (en) * | 2013-05-21 | 2014-12-03 | 阿里巴巴集团控股有限公司 | Data synchronization method, device and system |
| CN103336815B (en) * | 2013-06-27 | 2016-12-28 | 北京京东尚科信息技术有限公司 | The system and method that the web advertisement pushes |
| CN105959390A (en) * | 2016-06-13 | 2016-09-21 | 乐视控股(北京)有限公司 | Unified management system and method of micro services |
-
2016
- 2016-12-27 CN CN201611229087.5A patent/CN106817408B/en active Active
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102521044A (en) * | 2011-12-30 | 2012-06-27 | 北京拓明科技有限公司 | Distributed task scheduling method and system based on messaging middleware |
| CN102739775A (en) * | 2012-05-29 | 2012-10-17 | 宁波东冠科技有限公司 | Method for monitoring and managing Internet of Things data acquisition server cluster |
| CN103973725A (en) * | 2013-01-28 | 2014-08-06 | 阿里巴巴集团控股有限公司 | Distributed collaboration method and collaboration device |
| WO2016039963A3 (en) * | 2014-09-10 | 2016-09-01 | Ebay Inc. | Resource sharing between two resource allocation systems |
| CN105991737A (en) * | 2015-02-26 | 2016-10-05 | 阿里巴巴集团控股有限公司 | Distributed task scheduling method and system |
| CN106027634A (en) * | 2016-05-16 | 2016-10-12 | 白杨 | Baiyang message port switch service |
Also Published As
| Publication number | Publication date |
|---|---|
| CN106817408A (en) | 2017-06-09 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN106817408B (en) | Distributed server cluster scheduling method and device | |
| CN111049705B (en) | Method and device for monitoring distributed storage system | |
| CN106789362B (en) | Equipment management method and network management system | |
| CN106919445B (en) | A method and apparatus for scheduling containers in parallel in a cluster | |
| US9911148B2 (en) | Querying for business service processing status information | |
| CN106909451A (en) | A kind of distributed task dispatching system and method | |
| US8930316B2 (en) | System and method for providing partition persistent state consistency in a distributed data grid | |
| WO2020181813A1 (en) | Task scheduling method based on data processing and related device | |
| US20130047165A1 (en) | Context-Aware Request Dispatching in Clustered Environments | |
| CN102255926B (en) | Method for allocating tasks in Map Reduce system, system and device | |
| CN107291546A (en) | A kind of resource regulating method and device | |
| CN114625533B (en) | Distributed task scheduling method, device, electronic device and storage medium | |
| US10505863B1 (en) | Multi-framework distributed computation | |
| CN111414241B (en) | Batch data processing method, device, system, computer equipment and computer readable storage medium | |
| CN109656690A (en) | Scheduling system, method and storage medium | |
| CN113032125A (en) | Job scheduling method, device, computer system and computer-readable storage medium | |
| WO2020147301A1 (en) | Method and apparatus for implementing multi-tenant service access, and electronic device | |
| CN109783151B (en) | Method and device for rule change | |
| WO2022105138A1 (en) | Decentralized task scheduling method, apparatus, device, and medium | |
| CN110046178A (en) | The method and apparatus of distributed data inquiry | |
| CN110489224B (en) | A method and device for task scheduling | |
| CN110659124A (en) | A message processing method and device | |
| CN106293933A (en) | A kind of cluster resource configuration supporting much data Computational frames and dispatching method | |
| CN109347982A (en) | A scheduling method and device for a data center | |
| CN113806177A (en) | Cluster monitoring method and device, electronic equipment and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |