+

US20160099999A1 - Lightweight framework with dynamic self-organizing coordination capability for clustered applications - Google Patents

Lightweight framework with dynamic self-organizing coordination capability for clustered applications Download PDF

Info

Publication number
US20160099999A1
US20160099999A1 US14/712,040 US201514712040A US2016099999A1 US 20160099999 A1 US20160099999 A1 US 20160099999A1 US 201514712040 A US201514712040 A US 201514712040A US 2016099999 A1 US2016099999 A1 US 2016099999A1
Authority
US
United States
Prior art keywords
node
task
nodes
response
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/712,040
Inventor
Anna Joffe
Howard A. Kelsey
Viktor Levine
Michael P.W. Thornton
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US14/712,040 priority Critical patent/US20160099999A1/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: JOFFE, ANNA, KELSEY, HOWARD A., LEVINE, VIKTOR, THORNTON, MICHAEL P.W.
Publication of US20160099999A1 publication Critical patent/US20160099999A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1014Server selection for load balancing based on the content of a request
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1025Dynamic adaptation of the criteria on which the server selection is based
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1061Peer-to-peer [P2P] networks using node-based peer discovery mechanisms

Definitions

  • the present invention relates generally to distributed computing systems and architectures, and more particularly to a lightweight framework with dynamic self-organizing coordination capacity for clustered applications.
  • the software Inherently in most applications, it is a need to support highly parallel processing of requests.
  • the software should be designed so that there is as little coupling between nodes as possible, thus preventing bottlenecks from being introduced.
  • these nodes normally will share some common application states, and therefore it is also likely that a subset of processing tasks, usually long-running ones, will need coordination among nodes.
  • the application uses a reference dataset that has to be updated periodically. In this scenario, some processing of data may need to happen, followed by a switchover to the new dataset. For such a long-running task, an effective way of minimizing data corruption is to perform it on a single node. The remaining nodes will receive notification when it is safe to switch over to the new version of the dataset.
  • the dedicated application node may have a common codebase with the other application nodes or it may not.
  • the dedicated application node is certainly configured differently from the other nodes, so it can perform the long-running tasks while the other nodes are running short user requests.
  • a flaw in this solution is evidently a single point of failure in that node. It is possible to strengthen the configuration by providing a backup dedicated node, but the solution comes up against the task of synchronizing these dedicated nodes.
  • Another solution is to allow one or more of the homogeneous nodes to self-elect to become a dedicated node. These nodes are identical to other nodes in all other respects.
  • the self-election mechanism is driven by some rules.
  • This solution is different from the previous one in that all nodes start from identical configuration.
  • the self-elected node becomes a coordinating node for the whole cluster. It may be backed up by other nodes to provide resilience.
  • This approach has the deficiency that self election is semi-permanent, so the node in this role is only capable of performing these special long running tasks. Sometimes several of these nodes can perform these tasks simultaneously. Some synchronization will still be required between application nodes in this setup.
  • This solution has a few flaws as follows.
  • the dedicated node is tied up with special tasks. Multiple nodes are required to provide a degree of resilience.
  • the dedicated nodes are not utilized for running ordinary short tasks; therefore, there is a high level of redundancy in the form of idle nodes which can
  • a method for a lightweight framework with dynamic self-organizing coordination capacity for clustered applications receives an event and determines whether one of other nodes is processing a task triggered by the event. In response to determining that none of the other nodes is processing the task, the node processes the task as an active node. In response to determining that the one of the other nodes is processing the task, the node runs as one of one or more passive nodes, wherein the one or more passive nodes monitor processing of the task.
  • a computer program product for a lightweight framework with dynamic self-organizing coordination capacity for clustered applications comprising a computer readable storage medium having program code embodied therewith.
  • the program code is executable for a node to: receive an event; determine whether one of other nodes is processing a task triggered by the event; process the task as an active node, in response to determining that none of the other nodes is processing the task; run as one of one or more passive nodes that monitor processing of the task, in response to determining that the one of the other nodes is processing the task.
  • a computer system for a lightweight framework with dynamic self-organizing coordination capacity for clustered applications comprises one or more processors, one or more computer readable tangible storage devices, and program instructions stored on at least one of the one or more computer readable tangible storage devices for execution by at least one of the one or more processors.
  • the program instructions are executable for a node to receive an event.
  • the program instructions are executable for the node to determine whether one of other nodes is processing a task triggered by the event.
  • the program instructions are executable for the node to process the task as an active node, in response to determining that none of the other nodes is processing the task.
  • the program instructions are executable for the node to run as one of one or more passive nodes that monitor processing of the task, in response to determining that the one of the other nodes is processing the task.
  • FIG. 1 is a diagram showing components in a system of clustered applications, in accordance with one embodiment of the present invention.
  • FIG. 2 is a flowchart showing operational steps of a lightweight framework with dynamic self-organizing coordination capacity for clustered applications, in accordance with one embodiment of the present invention.
  • FIG. 3 is a diagram illustrating components of a node shown in FIG. 1 , in accordance with one embodiment of the present invention.
  • Embodiments of the present invention provide a lightweight framework with dynamic self-organizing coordination capacity for clustered applications.
  • the lightweight framework which can be utilized by any networked service application which is capable or should be capable of being deployed in a clustered mode, for example a cloud application.
  • the lightweight framework provides a means for managing tasks that require coordination between application nodes: a long-running task is being executed by a single node chosen dynamically using some rules. Remaining nodes will continue to serve short-lived tasks. There is internode communication established via the use of a publish-subscribe channel allowing the synchronization of these activities. Therefore, no coordinating node is necessary, and new nodes can be added to and removed from the cluster at will. Upon completion of a long-running task, the executing node becomes an ordinary node indistinguishable from the rest of the nodes. Several of these long-running tasks can be executed by different nodes simultaneously.
  • FIG. 1 is a diagram showing components in system 100 of clustered applications, in accordance with one embodiment of the present invention.
  • System 100 includes application server cluster 110 (e.g., an application cluster) comprising a plurality of application nodes 112 - 1 through 112 - n.
  • An application node is an instance of an application deployed in application server cluster 110 .
  • Application nodes 112 - 1 through 112 - n are running on modern hardware with inbuilt multithreading processors and on an operating system. Components of an application node will be discussed in later paragraphs with reference to FIG. 3 .
  • Application nodes 112 - 1 through 112 - n have common resources, for example, data repository, cache, enterprise service bus etc. Each node can be located on a dedicated physical machine, but that is not mandatory.
  • application nodes 112 - 1 through 112 - n receive user events that are often requests to perform atomic operations and received either from a user or from a data stream channel. Usually the user events require quick responses.
  • the user events trigger ordinary tasks which are main business tasks. The ordinary tasks are usually required to be completed in a short time, for example, tens of seconds at the most. For each user event received, a node starts an ordinary task. These are comparatively short processes and thus the user often awaits the outcome. Usually these ordinary tasks are part of a bigger business process.
  • the task is naturally executed by the node in isolation from other nodes, unless there is a recursion where the node is a client to another node. But, that only confirms the isolation principle.
  • application nodes 112 - 1 through 112 - n also receive housekeeping events.
  • the housekeeping events may come from an external source, for example, to trigger a reference data load.
  • the housekeeping events may also be internally generated, such as periodic purging.
  • the housekeeping events trigger housekeeping tasks.
  • the housekeeping tasks often long running complex tasks, which can take between several minutes and some hours to complete.
  • Each of application nodes 112 - 1 through 112 - n has a rule engine which decides whether an event is a user event or a housekeeping event.
  • the rule engine determines, for example, properties of the event, type of the event, and method of delivery (such as HTTP or JMS).
  • Application server cluster 110 further comprises communication channel 116 .
  • Communication channel 116 is used by application nodes 112 - 1 through 112 - n to communicate with each other.
  • a publish-subscribe paradigm should be supported by communication channel 116 . Examples are JMS Topic (a distribution mechanism for publishing messages that are delivered to multiple subscribers) and MQTT (Message Queue Telemetry Transport, a publish-subscribe based “light weight” messaging protocol for use on top of the TCP/IP protocol), but there are other implementations as well.
  • Application nodes 112 - 1 through 112 - n use cluster notifications to communicate. The cluster notifications contain state information of the nodes.
  • Application server cluster 110 further comprises cluster database 114 .
  • Cluster database 114 is a repository of shared information relating to the state of processing of all housekeeping tasks.
  • Cluster database 114 may be a RDBMS (relational database management system), a cache, a file-based storage, or a combination of these.
  • Cluster database 114 may be used by the application logic as persistent storage.
  • all nodes share a common cluster database.
  • both user events and housekeeping events arrive to the nodes via a load balancing mechanism (load balancer) assuring that an event arrives only to a single node.
  • load balancing mechanism is JMS Queue, where a message is assured to read only once and thus arrives only to a single listener. More complex solutions are available for JMS and other communication protocols.
  • the whole agglomeration of nodes is a single application.
  • each node has a rule engine to discriminate between events, and all nodes run on a multi-threaded platform.
  • each node is stateless so that application data is stored in a common data repository.
  • FIG. 2 is flowchart 200 showing operational steps of a lightweight framework with dynamic self-organizing coordination capacity for clustered applications, in accordance with one embodiment of the present invention.
  • a node one of application nodes 112 - 1 through 112 - n ) receives a housekeeping event. For each housekeeping event received, the node will attempt to start a housekeeping task.
  • the node determines whether another node is processing the housekeeping task.
  • the node In response to determining that no other node is processing the housekeeping task (NO branch of decision step 202 ), at step 203 , the node runs as an active node to process the housekeeping task. The node creates a persistent record of the association between the housekeeping event and the node on cluster database 114 (shown in FIG. 1 ). In response to determining that another node is processing the housekeeping task (YES branch of decision step 202 ), at step 211 , the node runs as a passive node to monitor the housekeeping task.
  • step 204 the node determines whether task initiation at step 203 succeeds. In response to determining that the task initiation fails (NO branch of decision step 204 ), the node executes step 211 (the node runs as the passive node). In response to determining that the task initiation succeeds (YES branch of decision step 204 ), the node (as the active node) at step 205 starts heartbeat notifications.
  • the node (as the active node) emits periodic heartbeat notifications over communication channel 116 (shown in FIG. 1 ), indicating that the node is still alive as the active node and processing the housekeeping task. All other nodes are passive nodes in respect to the housekeeping event.
  • the other nodes start monitoring the heartbeat notifications and detect a possible interruption of the heartbeat notifications. The interruption of the heartbeat notifications signifies a failure of the active node.
  • the node runs executors to process the housekeeping task.
  • the node (as the active node) completes the housekeeping task.
  • the node makes, on cluster database 114 , a persistent record of completion of the housekeeping task.
  • the node stops the heartbeat notifications.
  • the node (as the active node) at step 209 sends to all the other nodes (acting as the passive nodes) a notification of completion of the housekeeping task.
  • all the other nodes receive the notification of completion sent by the active node at step 209 .
  • all the other nodes stop monitoring the housekeeping task.
  • the node While the node becomes the passive node at step 211 , one of the other nodes will be the active node for this housekeeping task.
  • the node (now as the passive node) checks periodically the heartbeat notifications sent from the active node.
  • the node (now as the passive node) records a number of missed heartbeat notifications.
  • the missed heartbeat notifications from the active node signify a failure of the active node. The failure can be caused by a number of factors, for example, network failure, overload of the active node, and of course the utter failure of the active node.
  • the node (now as the passive node) determines whether the number of the missed heartbeat notifications exceeds a threshold.
  • the node In response to determining that the number of the missed heartbeat notifications does not exceed the threshold (NO branch of decision step 214 ), the node (now as the passive node) continues to monitor the heartbeat notifications and reiterates step 212 . In response to determining that the number of the missed heartbeat notifications exceeds the threshold (YES branch of decision step 214 ), at step 215 , the node (now as the passive node) attempts to become the active node.
  • the threshold reflects a certain tolerance such that the node (currently as passive node) executes step 215 only after a certain predetermined number of the notifications have been missed. Step 215 is done via shared records on cluster database 114 . If the node succeeds in marking the persistent record with its identifier, it will execute step 203 , i.e, run as the active node to initiate to process the housekeeping task. Remaining nodes will assume a passive role as before.
  • Newly started nodes will receive heartbeat notifications of any housekeeping tasks in progress and start their own monitoring.
  • One potential gap of restarting a housekeeping task after a take over is an inconsistency in a data model state, if a failed node has modified some of common data model (not shared in cluster database 114 ).
  • a simple solution is to run a long transaction on the active node.
  • a specific implementation may have a staged execution of a housekeeping task with safe points from which the task can continue after the take over. There is no 100% guarantee that the failed node will not come back to the task execution; for example, if a network connectivity has restored after a failure. It is possible to introduce a fail switch to notify a node that it should terminate the housekeeping task, if the node has been taken over by another node.
  • An abstraction from that is a notion of the originating node sending multiple parallel requests for processing to a cloud of services and awaiting the responses of all of these requests. Once received, its own processing can commence. There is a possibility of a retry of a failed request, or if possible, implementation of compensating logic on remote servers.
  • any node can execute multiple ordinary tasks and housekeeping tasks simultaneously. So, in an extreme case, even a single node online can perform all functions. Similarly, as many nodes as desired can be running in parallel retaining the same functionality in respect to both these types of tasks.
  • FIG. 3 is a diagram illustrating components of a node shown in FIG. 1 , in accordance with one embodiment of the present invention. It should be appreciated that FIG. 3 provides only an illustration of one implementation and does not imply any limitations with regard to the environment in which different embodiments may be implemented.
  • node 300 includes processor(s) 320 , memory 310 , tangible storage device(s) 330 , network interface(s) 340 , and I/O (input/output) interface(s) 350 .
  • Communications among the above-mentioned components of node 300 are denoted by numeral 390 .
  • Memory 310 includes ROM(s) (Read Only Memory) 311 , RAM(s) (Random Access Memory) 313 , and cache(s) 315 .
  • One or more operating systems 331 and one or more computer programs 333 reside on one or more computer readable tangible storage device(s) 330 .
  • Node 300 further includes I/O interface(s) 350 . I/O interface(s) 350 allows for input and output of data with external device(s) 360 that may be connected to node 300 .
  • Node 300 further includes network interface(s) 340 for communications between node 300 and a computer network.
  • the present invention may be a system, a method, and/or a computer program product.
  • the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
  • the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
  • the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • a non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device, such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or Flash memory erasable programmable read-only memory
  • SRAM static random access memory
  • CD-ROM compact disc read-only memory
  • DVD digital versatile disk
  • memory stick a floppy disk
  • mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing
  • a computer readable storage medium is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
  • Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network (LAN), a wide area network (WAN), and/or a wireless network.
  • the network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
  • Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++, and conventional procedural programming languages, such as the “C” programming language, or similar programming languages.
  • the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry in order to perform aspects of the present invention.
  • These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture, including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
  • the computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus, or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).
  • the functions noted in the block may occur out of the order noted in the figures.
  • two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Cardiology (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Hardware Redundancy (AREA)

Abstract

A method, a computer program product, and a computer system for a lightweight framework with dynamic self-organizing coordination capacity for clustered applications are provided. The lightweight framework provides a means for managing tasks that require coordination between application nodes. A node receives a task and determines whether one of other nodes is processing the task. The node runs as an active node to process the task, in response to determining that none of the other nodes is processing the task. The node runs as one of one or more passive nodes that monitor processing of the task, in response to determining that the one of the other node is processing the task.

Description

    CROSS REFERENCE TO RELATED APPLICATION
  • This application is a Continuation Application of pending U.S. patent application Ser. No. 14/504,609 filed on Oct. 2, 2014.
  • TECHNICAL FIELD OF THE INVENTION
  • The present invention relates generally to distributed computing systems and architectures, and more particularly to a lightweight framework with dynamic self-organizing coordination capacity for clustered applications.
  • BACKGROUND
  • Applications are increasingly moving to a clustered design, where any number of nodes may be active. These types of applications are also a natural fit for deployment on cloud infrastructure. In a cloud environment, application instances or nodes need an ability to join and leave processing cluster at unspecified time. This is non-critical for short-running tasks, such as serving users' requests, as the tasks are often covered by a transactional control which offers an ability to recover.
  • Inherently in most applications, it is a need to support highly parallel processing of requests. For these type of applications, the software should be designed so that there is as little coupling between nodes as possible, thus preventing bottlenecks from being introduced. However, these nodes normally will share some common application states, and therefore it is also likely that a subset of processing tasks, usually long-running ones, will need coordination among nodes. For example, the application uses a reference dataset that has to be updated periodically. In this scenario, some processing of data may need to happen, followed by a switchover to the new dataset. For such a long-running task, an effective way of minimizing data corruption is to perform it on a single node. The remaining nodes will receive notification when it is safe to switch over to the new version of the dataset.
  • There exist solutions which address this type of long-running tasks. One common approach is to designate a dedicated application node to fulfill this role. The dedicated application node may have a common codebase with the other application nodes or it may not. The dedicated application node is certainly configured differently from the other nodes, so it can perform the long-running tasks while the other nodes are running short user requests. A flaw in this solution is evidently a single point of failure in that node. It is possible to strengthen the configuration by providing a backup dedicated node, but the solution comes up against the task of synchronizing these dedicated nodes.
  • Another solution is to allow one or more of the homogeneous nodes to self-elect to become a dedicated node. These nodes are identical to other nodes in all other respects. The self-election mechanism is driven by some rules. This solution is different from the previous one in that all nodes start from identical configuration. The self-elected node becomes a coordinating node for the whole cluster. It may be backed up by other nodes to provide resilience. This approach has the deficiency that self election is semi-permanent, so the node in this role is only capable of performing these special long running tasks. Sometimes several of these nodes can perform these tasks simultaneously. Some synchronization will still be required between application nodes in this setup. This solution has a few flaws as follows. The dedicated node is tied up with special tasks. Multiple nodes are required to provide a degree of resilience. The dedicated nodes are not utilized for running ordinary short tasks; therefore, there is a high level of redundancy in the form of idle nodes which can otherwise be utilized.
  • SUMMARY
  • In one aspect, a method for a lightweight framework with dynamic self-organizing coordination capacity for clustered applications is provided. A node receives an event and determines whether one of other nodes is processing a task triggered by the event. In response to determining that none of the other nodes is processing the task, the node processes the task as an active node. In response to determining that the one of the other nodes is processing the task, the node runs as one of one or more passive nodes, wherein the one or more passive nodes monitor processing of the task.
  • In another aspect, a computer program product for a lightweight framework with dynamic self-organizing coordination capacity for clustered applications is provided. The computer program product comprising a computer readable storage medium having program code embodied therewith. The program code is executable for a node to: receive an event; determine whether one of other nodes is processing a task triggered by the event; process the task as an active node, in response to determining that none of the other nodes is processing the task; run as one of one or more passive nodes that monitor processing of the task, in response to determining that the one of the other nodes is processing the task.
  • In yet another aspect, a computer system for a lightweight framework with dynamic self-organizing coordination capacity for clustered applications is provided. The computer system comprises one or more processors, one or more computer readable tangible storage devices, and program instructions stored on at least one of the one or more computer readable tangible storage devices for execution by at least one of the one or more processors. The program instructions are executable for a node to receive an event. The program instructions are executable for the node to determine whether one of other nodes is processing a task triggered by the event. The program instructions are executable for the node to process the task as an active node, in response to determining that none of the other nodes is processing the task. The program instructions are executable for the node to run as one of one or more passive nodes that monitor processing of the task, in response to determining that the one of the other nodes is processing the task.
  • BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
  • FIG. 1 is a diagram showing components in a system of clustered applications, in accordance with one embodiment of the present invention.
  • FIG. 2 is a flowchart showing operational steps of a lightweight framework with dynamic self-organizing coordination capacity for clustered applications, in accordance with one embodiment of the present invention.
  • FIG. 3 is a diagram illustrating components of a node shown in FIG. 1, in accordance with one embodiment of the present invention.
  • DETAILED DESCRIPTION
  • Embodiments of the present invention provide a lightweight framework with dynamic self-organizing coordination capacity for clustered applications. The lightweight framework which can be utilized by any networked service application which is capable or should be capable of being deployed in a clustered mode, for example a cloud application. The lightweight framework provides a means for managing tasks that require coordination between application nodes: a long-running task is being executed by a single node chosen dynamically using some rules. Remaining nodes will continue to serve short-lived tasks. There is internode communication established via the use of a publish-subscribe channel allowing the synchronization of these activities. Therefore, no coordinating node is necessary, and new nodes can be added to and removed from the cluster at will. Upon completion of a long-running task, the executing node becomes an ordinary node indistinguishable from the rest of the nodes. Several of these long-running tasks can be executed by different nodes simultaneously.
  • In this lightweight framework, there is no special assignment of a node or nodes to perform a task. Moreover, assuming a multithreaded platform, a single application node can perform ordinary short-lived tasks simultaneously with long running tasks. This provides a foundation for extreme resilience in the lightweight framework. As long as there is at least a single node, the functionality is complete, albeit potentially with a reduced performance.
  • Embodiments of the present invention are now described in detail with reference to the accompanying figures.
  • FIG. 1 is a diagram showing components in system 100 of clustered applications, in accordance with one embodiment of the present invention. System 100 includes application server cluster 110 (e.g., an application cluster) comprising a plurality of application nodes 112-1 through 112-n. An application node is an instance of an application deployed in application server cluster 110. Application nodes 112-1 through 112-n are running on modern hardware with inbuilt multithreading processors and on an operating system. Components of an application node will be discussed in later paragraphs with reference to FIG. 3. Application nodes 112-1 through 112-n have common resources, for example, data repository, cache, enterprise service bus etc. Each node can be located on a dedicated physical machine, but that is not mandatory.
  • Via load balancer 118, application nodes 112-1 through 112-n receive user events that are often requests to perform atomic operations and received either from a user or from a data stream channel. Usually the user events require quick responses. The user events trigger ordinary tasks which are main business tasks. The ordinary tasks are usually required to be completed in a short time, for example, tens of seconds at the most. For each user event received, a node starts an ordinary task. These are comparatively short processes and thus the user often awaits the outcome. Usually these ordinary tasks are part of a bigger business process. The task is naturally executed by the node in isolation from other nodes, unless there is a recursion where the node is a client to another node. But, that only confirms the isolation principle.
  • Via load balancer 118, application nodes 112-1 through 112-n also receive housekeeping events. The housekeeping events may come from an external source, for example, to trigger a reference data load. The housekeeping events may also be internally generated, such as periodic purging. The housekeeping events trigger housekeeping tasks. The housekeeping tasks often long running complex tasks, which can take between several minutes and some hours to complete.
  • There are some internal application events, for instance scheduled tasks shown in FIG. 1. An application can emit an event itself and will warrant the housekeeping tasks processing. Obviously, it is not possible to synchronize these events between nodes. The safeguard is in place; therefore, if a housekeeping event arrives whilst a corresponding housekeeping task is in progress, it will be ignored. Obviously, the decision whether to process the same housekeeping task in parallel is configurable.
  • Each of application nodes 112-1 through 112-n has a rule engine which decides whether an event is a user event or a housekeeping event. The rule engine determines, for example, properties of the event, type of the event, and method of delivery (such as HTTP or JMS).
  • Application server cluster 110 further comprises communication channel 116. Communication channel 116 is used by application nodes 112-1 through 112-n to communicate with each other. A publish-subscribe paradigm should be supported by communication channel 116. Examples are JMS Topic (a distribution mechanism for publishing messages that are delivered to multiple subscribers) and MQTT (Message Queue Telemetry Transport, a publish-subscribe based “light weight” messaging protocol for use on top of the TCP/IP protocol), but there are other implementations as well. Application nodes 112-1 through 112-n use cluster notifications to communicate. The cluster notifications contain state information of the nodes.
  • Application server cluster 110 further comprises cluster database 114. Cluster database 114 is a repository of shared information relating to the state of processing of all housekeeping tasks. Cluster database 114 may be a RDBMS (relational database management system), a cache, a file-based storage, or a combination of these. Cluster database 114 may be used by the application logic as persistent storage.
  • In an embodiment of the lightweight framework with dynamic self-organizing coordination capacity for clustered applications, all nodes share a common cluster database. In the embodiment, there is a channel available to all nodes. In the embodiment, both user events and housekeeping events arrive to the nodes via a load balancing mechanism (load balancer) assuring that an event arrives only to a single node. A simple example of the load balancing mechanism is JMS Queue, where a message is assured to read only once and thus arrives only to a single listener. More complex solutions are available for JMS and other communication protocols. As far as a client is concerned, the whole agglomeration of nodes is a single application. In the embodiment, each node has a rule engine to discriminate between events, and all nodes run on a multi-threaded platform. Also, in the embodiment, each node is stateless so that application data is stored in a common data repository.
  • FIG. 2 is flowchart 200 showing operational steps of a lightweight framework with dynamic self-organizing coordination capacity for clustered applications, in accordance with one embodiment of the present invention. At step 201, a node (one of application nodes 112-1 through 112-n) receives a housekeeping event. For each housekeeping event received, the node will attempt to start a housekeeping task. At decision step 202, the node determines whether another node is processing the housekeeping task.
  • In response to determining that no other node is processing the housekeeping task (NO branch of decision step 202), at step 203, the node runs as an active node to process the housekeeping task. The node creates a persistent record of the association between the housekeeping event and the node on cluster database 114 (shown in FIG. 1). In response to determining that another node is processing the housekeeping task (YES branch of decision step 202), at step 211, the node runs as a passive node to monitor the housekeeping task.
  • After step 203 (the node runs as the active node), at decision step 204, the node determines whether task initiation at step 203 succeeds. In response to determining that the task initiation fails (NO branch of decision step 204), the node executes step 211 (the node runs as the passive node). In response to determining that the task initiation succeeds (YES branch of decision step 204), the node (as the active node) at step 205 starts heartbeat notifications. The node (as the active node) emits periodic heartbeat notifications over communication channel 116 (shown in FIG. 1), indicating that the node is still alive as the active node and processing the housekeeping task. All other nodes are passive nodes in respect to the housekeeping event. The other nodes start monitoring the heartbeat notifications and detect a possible interruption of the heartbeat notifications. The interruption of the heartbeat notifications signifies a failure of the active node.
  • At step 206, the node (as the active node) runs executors to process the housekeeping task. At step 207, the node (as the active node) completes the housekeeping task. The node makes, on cluster database 114, a persistent record of completion of the housekeeping task. Then, at step 208, the node (as the active node) stops the heartbeat notifications. The node (as the active node) at step 209 sends to all the other nodes (acting as the passive nodes) a notification of completion of the housekeeping task.
  • At step 221, all the other nodes (acting as the passive nodes) receive the notification of completion sent by the active node at step 209. At step 222, all the other nodes (passive nodes) stop monitoring the housekeeping task.
  • While the node becomes the passive node at step 211, one of the other nodes will be the active node for this housekeeping task. Now, at step 212, the node (now as the passive node) checks periodically the heartbeat notifications sent from the active node. At step 213, the node (now as the passive node) records a number of missed heartbeat notifications. The missed heartbeat notifications from the active node signify a failure of the active node. The failure can be caused by a number of factors, for example, network failure, overload of the active node, and of course the utter failure of the active node. At decision step 214, the node (now as the passive node) determines whether the number of the missed heartbeat notifications exceeds a threshold.
  • In response to determining that the number of the missed heartbeat notifications does not exceed the threshold (NO branch of decision step 214), the node (now as the passive node) continues to monitor the heartbeat notifications and reiterates step 212. In response to determining that the number of the missed heartbeat notifications exceeds the threshold (YES branch of decision step 214), at step 215, the node (now as the passive node) attempts to become the active node. The threshold reflects a certain tolerance such that the node (currently as passive node) executes step 215 only after a certain predetermined number of the notifications have been missed. Step 215 is done via shared records on cluster database 114. If the node succeeds in marking the persistent record with its identifier, it will execute step 203, i.e, run as the active node to initiate to process the housekeeping task. Remaining nodes will assume a passive role as before.
  • Newly started nodes will receive heartbeat notifications of any housekeeping tasks in progress and start their own monitoring. One potential gap of restarting a housekeeping task after a take over is an inconsistency in a data model state, if a failed node has modified some of common data model (not shared in cluster database 114). A simple solution is to run a long transaction on the active node. A specific implementation may have a staged execution of a housekeeping task with safe points from which the task can continue after the take over. There is no 100% guarantee that the failed node will not come back to the task execution; for example, if a network connectivity has restored after a failure. It is possible to introduce a fail switch to notify a node that it should terminate the housekeeping task, if the node has been taken over by another node.
  • It is possible to scale and parallelize execution of a housekeeping task across all nodes with minor changes. For example, if a task can be broken into smaller units, then for each one of these units another event (housekeeping or user event) is emitted on a normal event queue (or any other channel over which events are arriving). An arbitrary node picks each event, executes the appropriate task and notifies the sender node of a completion. The originating node can complete processing of the overall housekeeping task once all smaller tasks have completed or, indeed, without waiting for them to complete.
  • An abstraction from that is a notion of the originating node sending multiple parallel requests for processing to a cloud of services and awaiting the responses of all of these requests. Once received, its own processing can commence. There is a possibility of a retry of a failed request, or if possible, implementation of compensating logic on remote servers.
  • Since the assumption is that all nodes are deployed on a platform supporting multithreading, any node can execute multiple ordinary tasks and housekeeping tasks simultaneously. So, in an extreme case, even a single node online can perform all functions. Similarly, as many nodes as desired can be running in parallel retaining the same functionality in respect to both these types of tasks.
  • FIG. 3 is a diagram illustrating components of a node shown in FIG. 1, in accordance with one embodiment of the present invention. It should be appreciated that FIG. 3 provides only an illustration of one implementation and does not imply any limitations with regard to the environment in which different embodiments may be implemented.
  • Referring to FIG. 3, node 300 includes processor(s) 320, memory 310, tangible storage device(s) 330, network interface(s) 340, and I/O (input/output) interface(s) 350. In FIG. 3, communications among the above-mentioned components of node 300 are denoted by numeral 390. Memory 310 includes ROM(s) (Read Only Memory) 311, RAM(s) (Random Access Memory) 313, and cache(s) 315. One or more operating systems 331 and one or more computer programs 333 reside on one or more computer readable tangible storage device(s) 330. Node 300 further includes I/O interface(s) 350. I/O interface(s) 350 allows for input and output of data with external device(s) 360 that may be connected to node 300. Node 300 further includes network interface(s) 340 for communications between node 300 and a computer network.
  • The present invention may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
  • The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device, such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
  • Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network (LAN), a wide area network (WAN), and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
  • Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++, and conventional procedural programming languages, such as the “C” programming language, or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry in order to perform aspects of the present invention.
  • Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
  • These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture, including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
  • The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus, or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.

Claims (18)

What is claimed is:
1. A method for a lightweight framework with dynamic self-organizing coordination capacity for clustered applications, the method comprising:
receiving, by a node, an event;
determining, by the node, whether one of other nodes is processing a task triggered by the event;
in response to determining that none of the other nodes is processing the task, processing, by the node, the task as an active node; and
in response to determining that the one of the other nodes is processing the task, running, by the node, as one of one or more passive nodes, wherein the one or more passive nodes monitor processing of the task.
2. The method of claim 1, further comprising:
in response to determining that the node does not succeed in task initiation, running, by the node, as the one of the one or more passive nodes.
3. The method of claim 1, further comprising:
in response to determining that the node succeeds in task initiation, starting, by the node, heartbeat notifications;
running, by the node, executors to process the task;
completing, by the node, the task;
stopping, by the node, the heartbeat notifications; and
sending, by the node, to the one or more passive nodes a notification of completion of the task.
4. The method of claim 1, further comprising:
receiving from the active node, by the one or more passive nodes, a notification of completion of the task; and
stopping, by the one or more passive nodes, monitoring the processing of the task.
5. The method of claim 1, further comprising:
in response to running as the one of the one or more passive nodes, checking, by the node, periodically heartbeat notifications sent from the active node;
recording, by the node, a number of missed heartbeat notifications;
determining, by the node, whether the number of the missed heartbeat notifications exceeds a predetermined threshold; and
in response to determining that the number of the missed heartbeat notifications does not exceed the predetermined threshold, continuing, by the node, to check periodically the heartbeat notifications.
6. The method of claim 5, further comprising:
in response to determining that the number of the missed heartbeat notifications exceeds the predetermined threshold, attempting to become the active node.
7. A computer program product for a lightweight framework with dynamic self-organizing coordination capacity for clustered applications, the computer program product comprising a computer readable storage medium having program code embodied therewith, the program code executable to:
receive, by a node, an event;
determine, by the node, whether one of other nodes is processing a task triggered by the event;
process, by the node, the task as an active node, in response to determining that none of the other nodes is processing the task; and
run, by the node, as one of one or more passive nodes, in response to determining that the one of the other nodes is processing the task, wherein the one or more passive nodes monitor processing of the task.
8. The computer program product of claim 7, further comprising the program code executable to:
run, by the node, as the one of the one or more passive nodes, in response to determining that the node does not succeed in task initiation.
9. The computer program product of claim 7, further comprising the program code executable to:
start, by the node, heartbeat notifications, in response to determining that the node succeeds in task initiation; run, by the node, executors to process the task;
complete, by the node, the task;
stop, by the node, the heartbeat notifications; and
send, by the node, to the one or more passive nodes a notification of completion of the task.
10. The computer program product of claim 7, further comprising the program code executable to:
receive from the active node, by the one or more passive nodes, a notification of completion of the task; and
stop, by the one or more passive nodes, monitoring the processing of the task.
11. The computer program product of claim 7, further comprising the program code executable to:
check, by the node, periodically heartbeat notifications sent from the active node, in response to running as the one of the one or more passive nodes;
record, by the node, a number of missed heartbeat notifications;
determine, by the node, whether the number of the missed heartbeat notifications exceeds a predetermined threshold; and
continue, by the node, to check periodically the heartbeat notifications, in response to determining that the number of the missed heartbeat notifications does not exceed the predetermined threshold.
12. The computer program product of claim 11, further comprising the program code executable to:
attempt to become the active node, in response to determining that the number of the missed heartbeat notifications exceeds the predetermined threshold.
13. A computer system for a lightweight framework with dynamic self-organizing coordination capacity for clustered applications, the computer system comprising:
one or more processors, one or more computer-readable tangible storage devices, and program instructions stored on at least one of the one or more computer-readable tangible storage devices for execution by at least one of the one or more processors, the program instructions executable to:
receive, by a node, an event;
determine, by the node, whether one of other nodes is processing a task triggered by the event;
process, by the node, the task as an active node, in response to determining that none of the other nodes is processing the task; and
run, by the node, as one of one or more passive nodes, in response to determining that the one of the other nodes is processing the task, wherein the one or more passive nodes monitor processing of the task.
14. The computer system of claim 13, further comprising the program instructions executable to:
run, by the node, as the one of the one or more passive nodes, in response to determining that the node does not succeed in task initiation.
15. The computer system of claim 13, further comprising the program instructions executable to:
start, by the node, heartbeat notifications, in response to determining that the node succeeds in task initiation;
run, by the node, executors to process the task;
complete, by the node, the task;
stop, by the node, the heartbeat notifications; and
send, by the node, to the one or more passive nodes a notification of completion of the task.
16. The computer system of claim 13, further comprising the program instructions executable to:
receive from the active node, by the one or more passive nodes, a notification of completion of the task; and
stop, by the one or more passive nodes, monitoring the processing of the task.
17. The computer system of claim 13, further comprising the program instructions executable to:
check, by the node, periodically heartbeat notifications sent from the active node, in response to running as the one of the one or more passive nodes;
record, by the node, a number of missed heartbeat notifications;
determine, by the node, whether the number of the missed heartbeat notifications exceeds a predetermined threshold; and continue, by the node, to check periodically the heartbeat notifications, in response to determining that the number of the missed heartbeat notifications does not exceed the predetermined threshold.
18. The computer system of claim 17, further comprising the program instructions executable to:
attempt to become the active node, in response to determining that the number of the missed heartbeat notifications exceeds the predetermined threshold.
US14/712,040 2014-10-02 2015-05-14 Lightweight framework with dynamic self-organizing coordination capability for clustered applications Abandoned US20160099999A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/712,040 US20160099999A1 (en) 2014-10-02 2015-05-14 Lightweight framework with dynamic self-organizing coordination capability for clustered applications

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US14/504,609 US10904111B2 (en) 2014-10-02 2014-10-02 Lightweight framework with dynamic self-organizing coordination capability for clustered applications
US14/712,040 US20160099999A1 (en) 2014-10-02 2015-05-14 Lightweight framework with dynamic self-organizing coordination capability for clustered applications

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US14/504,609 Continuation US10904111B2 (en) 2014-10-02 2014-10-02 Lightweight framework with dynamic self-organizing coordination capability for clustered applications

Publications (1)

Publication Number Publication Date
US20160099999A1 true US20160099999A1 (en) 2016-04-07

Family

ID=55633600

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/504,609 Expired - Fee Related US10904111B2 (en) 2014-10-02 2014-10-02 Lightweight framework with dynamic self-organizing coordination capability for clustered applications
US14/712,040 Abandoned US20160099999A1 (en) 2014-10-02 2015-05-14 Lightweight framework with dynamic self-organizing coordination capability for clustered applications

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US14/504,609 Expired - Fee Related US10904111B2 (en) 2014-10-02 2014-10-02 Lightweight framework with dynamic self-organizing coordination capability for clustered applications

Country Status (1)

Country Link
US (2) US10904111B2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108900590A (en) * 2018-06-20 2018-11-27 郑州云海信息技术有限公司 A kind of distributed software construction method and device
US10904111B2 (en) 2014-10-02 2021-01-26 International Business Machines Corporation Lightweight framework with dynamic self-organizing coordination capability for clustered applications

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109150570B (en) * 2017-06-27 2022-04-08 阿里巴巴集团控股有限公司 Updating method, system, end node and electronic equipment
US10735509B2 (en) * 2018-01-31 2020-08-04 Ca, Inc. Systems and methods for synchronizing microservice data stores
CN109286525B (en) * 2018-09-28 2022-02-25 昆明能讯科技有限责任公司 Double-computer backup method based on MQTT communication and heartbeat between main and standby

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130205162A1 (en) * 2012-02-03 2013-08-08 Fujitsu Limited Redundant computer control method and device
US20160077936A1 (en) * 2014-09-12 2016-03-17 Facebook, Inc. Failover mechanism in a distributed computing system
US9444884B2 (en) * 2011-12-31 2016-09-13 Level 3 Communications, Llc Load-aware load-balancing cluster without a central load balancer

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001035278A1 (en) 1999-11-10 2001-05-17 Fakhouri Sameh A A decision based system for managing distributed resources and modeling the global optimization problem
JP4781089B2 (en) 2005-11-15 2011-09-28 株式会社ソニー・コンピュータエンタテインメント Task assignment method and task assignment device
WO2007064799A1 (en) 2005-12-01 2007-06-07 Cassatt Corporation Automated deployment and configuration of applications in an autonomically controlled distributed computing system
US8296434B1 (en) 2009-05-28 2012-10-23 Amazon Technologies, Inc. Providing dynamically scaling computing load balancing
US20110161961A1 (en) * 2009-12-29 2011-06-30 Nokia Corporation Method and apparatus for optimized information transmission using dedicated threads
US8719415B1 (en) 2010-06-28 2014-05-06 Amazon Technologies, Inc. Use of temporarily available computing nodes for dynamic scaling of a cluster
US8412810B1 (en) 2010-07-02 2013-04-02 Adobe Systems Incorporated Provisioning and managing a cluster deployed on a cloud
US9239996B2 (en) 2010-08-24 2016-01-19 Solano Labs, Inc. Method and apparatus for clearing cloud compute demand
US8955097B2 (en) * 2011-12-13 2015-02-10 Mcafee, Inc. Timing management in a large firewall cluster
US8977637B2 (en) 2012-08-30 2015-03-10 International Business Machines Corporation Facilitating field programmable gate array accelerations of database functions
US9043644B2 (en) 2012-12-04 2015-05-26 International Business Machines Corporation Using separate processes to handle short-lived and long-lived jobs to reduce failure of processes
WO2015069912A1 (en) * 2013-11-06 2015-05-14 Improvement Interactive, LLC Dynamic application version selection
US10904111B2 (en) 2014-10-02 2021-01-26 International Business Machines Corporation Lightweight framework with dynamic self-organizing coordination capability for clustered applications
US9934272B2 (en) 2015-10-22 2018-04-03 International Business Machines Corporation Processing a database query in a database system
US10176147B2 (en) 2017-03-07 2019-01-08 Qualcomm Incorporated Multi-processor core three-dimensional (3D) integrated circuits (ICs) (3DICs), and related methods

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9444884B2 (en) * 2011-12-31 2016-09-13 Level 3 Communications, Llc Load-aware load-balancing cluster without a central load balancer
US20130205162A1 (en) * 2012-02-03 2013-08-08 Fujitsu Limited Redundant computer control method and device
US20160077936A1 (en) * 2014-09-12 2016-03-17 Facebook, Inc. Failover mechanism in a distributed computing system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10904111B2 (en) 2014-10-02 2021-01-26 International Business Machines Corporation Lightweight framework with dynamic self-organizing coordination capability for clustered applications
CN108900590A (en) * 2018-06-20 2018-11-27 郑州云海信息技术有限公司 A kind of distributed software construction method and device

Also Published As

Publication number Publication date
US10904111B2 (en) 2021-01-26
US20160099825A1 (en) 2016-04-07

Similar Documents

Publication Publication Date Title
US10078564B2 (en) Preventing split-brain scenario in a high-availability cluster
US11966307B2 (en) Re-aligning data replication configuration of primary and secondary data serving entities of a cross-site storage solution after a failover event
US8799906B2 (en) Processing a batched unit of work
US10474694B2 (en) Zero-data loss recovery for active-active sites configurations
US10049130B2 (en) Reliability improvement of distributed transaction processing optimizations based on connection status
US8875157B2 (en) Deployment of pre-scheduled tasks in clusters
US9032032B2 (en) Data replication feedback for transport input/output
US10904111B2 (en) Lightweight framework with dynamic self-organizing coordination capability for clustered applications
US12045491B2 (en) Resynchronization of individual volumes of a consistency group (CG) within a cross-site storage solution while maintaining synchronization of other volumes of the CG
US11741075B2 (en) Methods and system of tracking transactions for distributed ledger
US10819641B2 (en) Highly available servers
US9098439B2 (en) Providing a fault tolerant system in a loosely-coupled cluster environment using application checkpoints and logs
US9940598B2 (en) Apparatus and method for controlling execution workflows
US11366728B2 (en) Systems and methods for enabling a highly available managed failover service
US8266301B2 (en) Deployment of asynchronous agentless agent functionality in clustered environments
US20130205017A1 (en) Computer failure monitoring method and device
US7870248B2 (en) Exploiting service heartbeats to monitor file share
US20120278429A1 (en) Cluster system, synchronization controlling method, server, and synchronization controlling program
US9026839B2 (en) Client based high availability method for message delivery
US11397632B2 (en) Safely recovering workloads within a finite timeframe from unhealthy cluster nodes
US9031969B2 (en) Guaranteed in-flight SQL insert operation support during an RAC database failover
US10165088B2 (en) Providing unit of work continuity in the event initiating client fails over
US10372542B2 (en) Fault tolerant event management system
US8738959B2 (en) Selective message loss handling in a cluster of replicated servers
US10394677B2 (en) Method to efficiently and reliably process ordered user account events in a cluster

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:JOFFE, ANNA;KELSEY, HOWARD A.;LEVINE, VIKTOR;AND OTHERS;SIGNING DATES FROM 20140924 TO 20140929;REEL/FRAME:035638/0577

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: ADVISORY ACTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载