US20160099999A1 - Lightweight framework with dynamic self-organizing coordination capability for clustered applications - Google Patents
Lightweight framework with dynamic self-organizing coordination capability for clustered applications Download PDFInfo
- Publication number
- US20160099999A1 US20160099999A1 US14/712,040 US201514712040A US2016099999A1 US 20160099999 A1 US20160099999 A1 US 20160099999A1 US 201514712040 A US201514712040 A US 201514712040A US 2016099999 A1 US2016099999 A1 US 2016099999A1
- Authority
- US
- United States
- Prior art keywords
- node
- task
- nodes
- response
- determining
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000012545 processing Methods 0.000 claims abstract description 52
- 230000004044 response Effects 0.000 claims abstract description 37
- 238000000034 method Methods 0.000 claims abstract description 29
- 230000008569 process Effects 0.000 claims abstract description 16
- 238000004590 computer program Methods 0.000 claims abstract description 15
- 238000003860 storage Methods 0.000 claims description 26
- 230000000977 initiatory effect Effects 0.000 claims description 9
- 238000012544 monitoring process Methods 0.000 claims description 6
- 230000001960 triggered effect Effects 0.000 claims description 6
- 238000010586 diagram Methods 0.000 description 14
- 238000004891 communication Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 8
- 230000005540 biological transmission Effects 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 230000002085 persistent effect Effects 0.000 description 4
- 238000013459 approach Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 238000013499 data model Methods 0.000 description 2
- 238000002955 isolation Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000000737 periodic effect Effects 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 238000005054 agglomeration Methods 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000010926 purge Methods 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/04—Processing captured monitoring data, e.g. for logfile generation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/10—Active monitoring, e.g. heartbeat, ping or trace-route
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1004—Server selection for load balancing
- H04L67/1014—Server selection for load balancing based on the content of a request
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1004—Server selection for load balancing
- H04L67/1025—Dynamic adaptation of the criteria on which the server selection is based
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/104—Peer-to-peer [P2P] networks
- H04L67/1061—Peer-to-peer [P2P] networks using node-based peer discovery mechanisms
Definitions
- the present invention relates generally to distributed computing systems and architectures, and more particularly to a lightweight framework with dynamic self-organizing coordination capacity for clustered applications.
- the software Inherently in most applications, it is a need to support highly parallel processing of requests.
- the software should be designed so that there is as little coupling between nodes as possible, thus preventing bottlenecks from being introduced.
- these nodes normally will share some common application states, and therefore it is also likely that a subset of processing tasks, usually long-running ones, will need coordination among nodes.
- the application uses a reference dataset that has to be updated periodically. In this scenario, some processing of data may need to happen, followed by a switchover to the new dataset. For such a long-running task, an effective way of minimizing data corruption is to perform it on a single node. The remaining nodes will receive notification when it is safe to switch over to the new version of the dataset.
- the dedicated application node may have a common codebase with the other application nodes or it may not.
- the dedicated application node is certainly configured differently from the other nodes, so it can perform the long-running tasks while the other nodes are running short user requests.
- a flaw in this solution is evidently a single point of failure in that node. It is possible to strengthen the configuration by providing a backup dedicated node, but the solution comes up against the task of synchronizing these dedicated nodes.
- Another solution is to allow one or more of the homogeneous nodes to self-elect to become a dedicated node. These nodes are identical to other nodes in all other respects.
- the self-election mechanism is driven by some rules.
- This solution is different from the previous one in that all nodes start from identical configuration.
- the self-elected node becomes a coordinating node for the whole cluster. It may be backed up by other nodes to provide resilience.
- This approach has the deficiency that self election is semi-permanent, so the node in this role is only capable of performing these special long running tasks. Sometimes several of these nodes can perform these tasks simultaneously. Some synchronization will still be required between application nodes in this setup.
- This solution has a few flaws as follows.
- the dedicated node is tied up with special tasks. Multiple nodes are required to provide a degree of resilience.
- the dedicated nodes are not utilized for running ordinary short tasks; therefore, there is a high level of redundancy in the form of idle nodes which can
- a method for a lightweight framework with dynamic self-organizing coordination capacity for clustered applications receives an event and determines whether one of other nodes is processing a task triggered by the event. In response to determining that none of the other nodes is processing the task, the node processes the task as an active node. In response to determining that the one of the other nodes is processing the task, the node runs as one of one or more passive nodes, wherein the one or more passive nodes monitor processing of the task.
- a computer program product for a lightweight framework with dynamic self-organizing coordination capacity for clustered applications comprising a computer readable storage medium having program code embodied therewith.
- the program code is executable for a node to: receive an event; determine whether one of other nodes is processing a task triggered by the event; process the task as an active node, in response to determining that none of the other nodes is processing the task; run as one of one or more passive nodes that monitor processing of the task, in response to determining that the one of the other nodes is processing the task.
- a computer system for a lightweight framework with dynamic self-organizing coordination capacity for clustered applications comprises one or more processors, one or more computer readable tangible storage devices, and program instructions stored on at least one of the one or more computer readable tangible storage devices for execution by at least one of the one or more processors.
- the program instructions are executable for a node to receive an event.
- the program instructions are executable for the node to determine whether one of other nodes is processing a task triggered by the event.
- the program instructions are executable for the node to process the task as an active node, in response to determining that none of the other nodes is processing the task.
- the program instructions are executable for the node to run as one of one or more passive nodes that monitor processing of the task, in response to determining that the one of the other nodes is processing the task.
- FIG. 1 is a diagram showing components in a system of clustered applications, in accordance with one embodiment of the present invention.
- FIG. 2 is a flowchart showing operational steps of a lightweight framework with dynamic self-organizing coordination capacity for clustered applications, in accordance with one embodiment of the present invention.
- FIG. 3 is a diagram illustrating components of a node shown in FIG. 1 , in accordance with one embodiment of the present invention.
- Embodiments of the present invention provide a lightweight framework with dynamic self-organizing coordination capacity for clustered applications.
- the lightweight framework which can be utilized by any networked service application which is capable or should be capable of being deployed in a clustered mode, for example a cloud application.
- the lightweight framework provides a means for managing tasks that require coordination between application nodes: a long-running task is being executed by a single node chosen dynamically using some rules. Remaining nodes will continue to serve short-lived tasks. There is internode communication established via the use of a publish-subscribe channel allowing the synchronization of these activities. Therefore, no coordinating node is necessary, and new nodes can be added to and removed from the cluster at will. Upon completion of a long-running task, the executing node becomes an ordinary node indistinguishable from the rest of the nodes. Several of these long-running tasks can be executed by different nodes simultaneously.
- FIG. 1 is a diagram showing components in system 100 of clustered applications, in accordance with one embodiment of the present invention.
- System 100 includes application server cluster 110 (e.g., an application cluster) comprising a plurality of application nodes 112 - 1 through 112 - n.
- An application node is an instance of an application deployed in application server cluster 110 .
- Application nodes 112 - 1 through 112 - n are running on modern hardware with inbuilt multithreading processors and on an operating system. Components of an application node will be discussed in later paragraphs with reference to FIG. 3 .
- Application nodes 112 - 1 through 112 - n have common resources, for example, data repository, cache, enterprise service bus etc. Each node can be located on a dedicated physical machine, but that is not mandatory.
- application nodes 112 - 1 through 112 - n receive user events that are often requests to perform atomic operations and received either from a user or from a data stream channel. Usually the user events require quick responses.
- the user events trigger ordinary tasks which are main business tasks. The ordinary tasks are usually required to be completed in a short time, for example, tens of seconds at the most. For each user event received, a node starts an ordinary task. These are comparatively short processes and thus the user often awaits the outcome. Usually these ordinary tasks are part of a bigger business process.
- the task is naturally executed by the node in isolation from other nodes, unless there is a recursion where the node is a client to another node. But, that only confirms the isolation principle.
- application nodes 112 - 1 through 112 - n also receive housekeeping events.
- the housekeeping events may come from an external source, for example, to trigger a reference data load.
- the housekeeping events may also be internally generated, such as periodic purging.
- the housekeeping events trigger housekeeping tasks.
- the housekeeping tasks often long running complex tasks, which can take between several minutes and some hours to complete.
- Each of application nodes 112 - 1 through 112 - n has a rule engine which decides whether an event is a user event or a housekeeping event.
- the rule engine determines, for example, properties of the event, type of the event, and method of delivery (such as HTTP or JMS).
- Application server cluster 110 further comprises communication channel 116 .
- Communication channel 116 is used by application nodes 112 - 1 through 112 - n to communicate with each other.
- a publish-subscribe paradigm should be supported by communication channel 116 . Examples are JMS Topic (a distribution mechanism for publishing messages that are delivered to multiple subscribers) and MQTT (Message Queue Telemetry Transport, a publish-subscribe based “light weight” messaging protocol for use on top of the TCP/IP protocol), but there are other implementations as well.
- Application nodes 112 - 1 through 112 - n use cluster notifications to communicate. The cluster notifications contain state information of the nodes.
- Application server cluster 110 further comprises cluster database 114 .
- Cluster database 114 is a repository of shared information relating to the state of processing of all housekeeping tasks.
- Cluster database 114 may be a RDBMS (relational database management system), a cache, a file-based storage, or a combination of these.
- Cluster database 114 may be used by the application logic as persistent storage.
- all nodes share a common cluster database.
- both user events and housekeeping events arrive to the nodes via a load balancing mechanism (load balancer) assuring that an event arrives only to a single node.
- load balancing mechanism is JMS Queue, where a message is assured to read only once and thus arrives only to a single listener. More complex solutions are available for JMS and other communication protocols.
- the whole agglomeration of nodes is a single application.
- each node has a rule engine to discriminate between events, and all nodes run on a multi-threaded platform.
- each node is stateless so that application data is stored in a common data repository.
- FIG. 2 is flowchart 200 showing operational steps of a lightweight framework with dynamic self-organizing coordination capacity for clustered applications, in accordance with one embodiment of the present invention.
- a node one of application nodes 112 - 1 through 112 - n ) receives a housekeeping event. For each housekeeping event received, the node will attempt to start a housekeeping task.
- the node determines whether another node is processing the housekeeping task.
- the node In response to determining that no other node is processing the housekeeping task (NO branch of decision step 202 ), at step 203 , the node runs as an active node to process the housekeeping task. The node creates a persistent record of the association between the housekeeping event and the node on cluster database 114 (shown in FIG. 1 ). In response to determining that another node is processing the housekeeping task (YES branch of decision step 202 ), at step 211 , the node runs as a passive node to monitor the housekeeping task.
- step 204 the node determines whether task initiation at step 203 succeeds. In response to determining that the task initiation fails (NO branch of decision step 204 ), the node executes step 211 (the node runs as the passive node). In response to determining that the task initiation succeeds (YES branch of decision step 204 ), the node (as the active node) at step 205 starts heartbeat notifications.
- the node (as the active node) emits periodic heartbeat notifications over communication channel 116 (shown in FIG. 1 ), indicating that the node is still alive as the active node and processing the housekeeping task. All other nodes are passive nodes in respect to the housekeeping event.
- the other nodes start monitoring the heartbeat notifications and detect a possible interruption of the heartbeat notifications. The interruption of the heartbeat notifications signifies a failure of the active node.
- the node runs executors to process the housekeeping task.
- the node (as the active node) completes the housekeeping task.
- the node makes, on cluster database 114 , a persistent record of completion of the housekeeping task.
- the node stops the heartbeat notifications.
- the node (as the active node) at step 209 sends to all the other nodes (acting as the passive nodes) a notification of completion of the housekeeping task.
- all the other nodes receive the notification of completion sent by the active node at step 209 .
- all the other nodes stop monitoring the housekeeping task.
- the node While the node becomes the passive node at step 211 , one of the other nodes will be the active node for this housekeeping task.
- the node (now as the passive node) checks periodically the heartbeat notifications sent from the active node.
- the node (now as the passive node) records a number of missed heartbeat notifications.
- the missed heartbeat notifications from the active node signify a failure of the active node. The failure can be caused by a number of factors, for example, network failure, overload of the active node, and of course the utter failure of the active node.
- the node (now as the passive node) determines whether the number of the missed heartbeat notifications exceeds a threshold.
- the node In response to determining that the number of the missed heartbeat notifications does not exceed the threshold (NO branch of decision step 214 ), the node (now as the passive node) continues to monitor the heartbeat notifications and reiterates step 212 . In response to determining that the number of the missed heartbeat notifications exceeds the threshold (YES branch of decision step 214 ), at step 215 , the node (now as the passive node) attempts to become the active node.
- the threshold reflects a certain tolerance such that the node (currently as passive node) executes step 215 only after a certain predetermined number of the notifications have been missed. Step 215 is done via shared records on cluster database 114 . If the node succeeds in marking the persistent record with its identifier, it will execute step 203 , i.e, run as the active node to initiate to process the housekeeping task. Remaining nodes will assume a passive role as before.
- Newly started nodes will receive heartbeat notifications of any housekeeping tasks in progress and start their own monitoring.
- One potential gap of restarting a housekeeping task after a take over is an inconsistency in a data model state, if a failed node has modified some of common data model (not shared in cluster database 114 ).
- a simple solution is to run a long transaction on the active node.
- a specific implementation may have a staged execution of a housekeeping task with safe points from which the task can continue after the take over. There is no 100% guarantee that the failed node will not come back to the task execution; for example, if a network connectivity has restored after a failure. It is possible to introduce a fail switch to notify a node that it should terminate the housekeeping task, if the node has been taken over by another node.
- An abstraction from that is a notion of the originating node sending multiple parallel requests for processing to a cloud of services and awaiting the responses of all of these requests. Once received, its own processing can commence. There is a possibility of a retry of a failed request, or if possible, implementation of compensating logic on remote servers.
- any node can execute multiple ordinary tasks and housekeeping tasks simultaneously. So, in an extreme case, even a single node online can perform all functions. Similarly, as many nodes as desired can be running in parallel retaining the same functionality in respect to both these types of tasks.
- FIG. 3 is a diagram illustrating components of a node shown in FIG. 1 , in accordance with one embodiment of the present invention. It should be appreciated that FIG. 3 provides only an illustration of one implementation and does not imply any limitations with regard to the environment in which different embodiments may be implemented.
- node 300 includes processor(s) 320 , memory 310 , tangible storage device(s) 330 , network interface(s) 340 , and I/O (input/output) interface(s) 350 .
- Communications among the above-mentioned components of node 300 are denoted by numeral 390 .
- Memory 310 includes ROM(s) (Read Only Memory) 311 , RAM(s) (Random Access Memory) 313 , and cache(s) 315 .
- One or more operating systems 331 and one or more computer programs 333 reside on one or more computer readable tangible storage device(s) 330 .
- Node 300 further includes I/O interface(s) 350 . I/O interface(s) 350 allows for input and output of data with external device(s) 360 that may be connected to node 300 .
- Node 300 further includes network interface(s) 340 for communications between node 300 and a computer network.
- the present invention may be a system, a method, and/or a computer program product.
- the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
- the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
- the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
- a non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device, such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing.
- RAM random access memory
- ROM read-only memory
- EPROM or Flash memory erasable programmable read-only memory
- SRAM static random access memory
- CD-ROM compact disc read-only memory
- DVD digital versatile disk
- memory stick a floppy disk
- mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing
- a computer readable storage medium is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
- Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network (LAN), a wide area network (WAN), and/or a wireless network.
- the network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers.
- a network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
- Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++, and conventional procedural programming languages, such as the “C” programming language, or similar programming languages.
- the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
- the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
- electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry in order to perform aspects of the present invention.
- These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture, including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
- the computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus, or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
- each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s).
- the functions noted in the block may occur out of the order noted in the figures.
- two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Cardiology (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Hardware Redundancy (AREA)
Abstract
Description
- This application is a Continuation Application of pending U.S. patent application Ser. No. 14/504,609 filed on Oct. 2, 2014.
- The present invention relates generally to distributed computing systems and architectures, and more particularly to a lightweight framework with dynamic self-organizing coordination capacity for clustered applications.
- Applications are increasingly moving to a clustered design, where any number of nodes may be active. These types of applications are also a natural fit for deployment on cloud infrastructure. In a cloud environment, application instances or nodes need an ability to join and leave processing cluster at unspecified time. This is non-critical for short-running tasks, such as serving users' requests, as the tasks are often covered by a transactional control which offers an ability to recover.
- Inherently in most applications, it is a need to support highly parallel processing of requests. For these type of applications, the software should be designed so that there is as little coupling between nodes as possible, thus preventing bottlenecks from being introduced. However, these nodes normally will share some common application states, and therefore it is also likely that a subset of processing tasks, usually long-running ones, will need coordination among nodes. For example, the application uses a reference dataset that has to be updated periodically. In this scenario, some processing of data may need to happen, followed by a switchover to the new dataset. For such a long-running task, an effective way of minimizing data corruption is to perform it on a single node. The remaining nodes will receive notification when it is safe to switch over to the new version of the dataset.
- There exist solutions which address this type of long-running tasks. One common approach is to designate a dedicated application node to fulfill this role. The dedicated application node may have a common codebase with the other application nodes or it may not. The dedicated application node is certainly configured differently from the other nodes, so it can perform the long-running tasks while the other nodes are running short user requests. A flaw in this solution is evidently a single point of failure in that node. It is possible to strengthen the configuration by providing a backup dedicated node, but the solution comes up against the task of synchronizing these dedicated nodes.
- Another solution is to allow one or more of the homogeneous nodes to self-elect to become a dedicated node. These nodes are identical to other nodes in all other respects. The self-election mechanism is driven by some rules. This solution is different from the previous one in that all nodes start from identical configuration. The self-elected node becomes a coordinating node for the whole cluster. It may be backed up by other nodes to provide resilience. This approach has the deficiency that self election is semi-permanent, so the node in this role is only capable of performing these special long running tasks. Sometimes several of these nodes can perform these tasks simultaneously. Some synchronization will still be required between application nodes in this setup. This solution has a few flaws as follows. The dedicated node is tied up with special tasks. Multiple nodes are required to provide a degree of resilience. The dedicated nodes are not utilized for running ordinary short tasks; therefore, there is a high level of redundancy in the form of idle nodes which can otherwise be utilized.
- In one aspect, a method for a lightweight framework with dynamic self-organizing coordination capacity for clustered applications is provided. A node receives an event and determines whether one of other nodes is processing a task triggered by the event. In response to determining that none of the other nodes is processing the task, the node processes the task as an active node. In response to determining that the one of the other nodes is processing the task, the node runs as one of one or more passive nodes, wherein the one or more passive nodes monitor processing of the task.
- In another aspect, a computer program product for a lightweight framework with dynamic self-organizing coordination capacity for clustered applications is provided. The computer program product comprising a computer readable storage medium having program code embodied therewith. The program code is executable for a node to: receive an event; determine whether one of other nodes is processing a task triggered by the event; process the task as an active node, in response to determining that none of the other nodes is processing the task; run as one of one or more passive nodes that monitor processing of the task, in response to determining that the one of the other nodes is processing the task.
- In yet another aspect, a computer system for a lightweight framework with dynamic self-organizing coordination capacity for clustered applications is provided. The computer system comprises one or more processors, one or more computer readable tangible storage devices, and program instructions stored on at least one of the one or more computer readable tangible storage devices for execution by at least one of the one or more processors. The program instructions are executable for a node to receive an event. The program instructions are executable for the node to determine whether one of other nodes is processing a task triggered by the event. The program instructions are executable for the node to process the task as an active node, in response to determining that none of the other nodes is processing the task. The program instructions are executable for the node to run as one of one or more passive nodes that monitor processing of the task, in response to determining that the one of the other nodes is processing the task.
-
FIG. 1 is a diagram showing components in a system of clustered applications, in accordance with one embodiment of the present invention. -
FIG. 2 is a flowchart showing operational steps of a lightweight framework with dynamic self-organizing coordination capacity for clustered applications, in accordance with one embodiment of the present invention. -
FIG. 3 is a diagram illustrating components of a node shown inFIG. 1 , in accordance with one embodiment of the present invention. - Embodiments of the present invention provide a lightweight framework with dynamic self-organizing coordination capacity for clustered applications. The lightweight framework which can be utilized by any networked service application which is capable or should be capable of being deployed in a clustered mode, for example a cloud application. The lightweight framework provides a means for managing tasks that require coordination between application nodes: a long-running task is being executed by a single node chosen dynamically using some rules. Remaining nodes will continue to serve short-lived tasks. There is internode communication established via the use of a publish-subscribe channel allowing the synchronization of these activities. Therefore, no coordinating node is necessary, and new nodes can be added to and removed from the cluster at will. Upon completion of a long-running task, the executing node becomes an ordinary node indistinguishable from the rest of the nodes. Several of these long-running tasks can be executed by different nodes simultaneously.
- In this lightweight framework, there is no special assignment of a node or nodes to perform a task. Moreover, assuming a multithreaded platform, a single application node can perform ordinary short-lived tasks simultaneously with long running tasks. This provides a foundation for extreme resilience in the lightweight framework. As long as there is at least a single node, the functionality is complete, albeit potentially with a reduced performance.
- Embodiments of the present invention are now described in detail with reference to the accompanying figures.
-
FIG. 1 is a diagram showing components insystem 100 of clustered applications, in accordance with one embodiment of the present invention.System 100 includes application server cluster 110 (e.g., an application cluster) comprising a plurality of application nodes 112-1 through 112-n. An application node is an instance of an application deployed inapplication server cluster 110. Application nodes 112-1 through 112-n are running on modern hardware with inbuilt multithreading processors and on an operating system. Components of an application node will be discussed in later paragraphs with reference toFIG. 3 . Application nodes 112-1 through 112-n have common resources, for example, data repository, cache, enterprise service bus etc. Each node can be located on a dedicated physical machine, but that is not mandatory. - Via
load balancer 118, application nodes 112-1 through 112-n receive user events that are often requests to perform atomic operations and received either from a user or from a data stream channel. Usually the user events require quick responses. The user events trigger ordinary tasks which are main business tasks. The ordinary tasks are usually required to be completed in a short time, for example, tens of seconds at the most. For each user event received, a node starts an ordinary task. These are comparatively short processes and thus the user often awaits the outcome. Usually these ordinary tasks are part of a bigger business process. The task is naturally executed by the node in isolation from other nodes, unless there is a recursion where the node is a client to another node. But, that only confirms the isolation principle. - Via
load balancer 118, application nodes 112-1 through 112-n also receive housekeeping events. The housekeeping events may come from an external source, for example, to trigger a reference data load. The housekeeping events may also be internally generated, such as periodic purging. The housekeeping events trigger housekeeping tasks. The housekeeping tasks often long running complex tasks, which can take between several minutes and some hours to complete. - There are some internal application events, for instance scheduled tasks shown in
FIG. 1 . An application can emit an event itself and will warrant the housekeeping tasks processing. Obviously, it is not possible to synchronize these events between nodes. The safeguard is in place; therefore, if a housekeeping event arrives whilst a corresponding housekeeping task is in progress, it will be ignored. Obviously, the decision whether to process the same housekeeping task in parallel is configurable. - Each of application nodes 112-1 through 112-n has a rule engine which decides whether an event is a user event or a housekeeping event. The rule engine determines, for example, properties of the event, type of the event, and method of delivery (such as HTTP or JMS).
-
Application server cluster 110 further comprisescommunication channel 116.Communication channel 116 is used by application nodes 112-1 through 112-n to communicate with each other. A publish-subscribe paradigm should be supported bycommunication channel 116. Examples are JMS Topic (a distribution mechanism for publishing messages that are delivered to multiple subscribers) and MQTT (Message Queue Telemetry Transport, a publish-subscribe based “light weight” messaging protocol for use on top of the TCP/IP protocol), but there are other implementations as well. Application nodes 112-1 through 112-n use cluster notifications to communicate. The cluster notifications contain state information of the nodes. -
Application server cluster 110 further comprisescluster database 114.Cluster database 114 is a repository of shared information relating to the state of processing of all housekeeping tasks.Cluster database 114 may be a RDBMS (relational database management system), a cache, a file-based storage, or a combination of these.Cluster database 114 may be used by the application logic as persistent storage. - In an embodiment of the lightweight framework with dynamic self-organizing coordination capacity for clustered applications, all nodes share a common cluster database. In the embodiment, there is a channel available to all nodes. In the embodiment, both user events and housekeeping events arrive to the nodes via a load balancing mechanism (load balancer) assuring that an event arrives only to a single node. A simple example of the load balancing mechanism is JMS Queue, where a message is assured to read only once and thus arrives only to a single listener. More complex solutions are available for JMS and other communication protocols. As far as a client is concerned, the whole agglomeration of nodes is a single application. In the embodiment, each node has a rule engine to discriminate between events, and all nodes run on a multi-threaded platform. Also, in the embodiment, each node is stateless so that application data is stored in a common data repository.
-
FIG. 2 isflowchart 200 showing operational steps of a lightweight framework with dynamic self-organizing coordination capacity for clustered applications, in accordance with one embodiment of the present invention. At step 201, a node (one of application nodes 112-1 through 112-n) receives a housekeeping event. For each housekeeping event received, the node will attempt to start a housekeeping task. Atdecision step 202, the node determines whether another node is processing the housekeeping task. - In response to determining that no other node is processing the housekeeping task (NO branch of decision step 202), at
step 203, the node runs as an active node to process the housekeeping task. The node creates a persistent record of the association between the housekeeping event and the node on cluster database 114 (shown inFIG. 1 ). In response to determining that another node is processing the housekeeping task (YES branch of decision step 202), atstep 211, the node runs as a passive node to monitor the housekeeping task. - After step 203 (the node runs as the active node), at
decision step 204, the node determines whether task initiation atstep 203 succeeds. In response to determining that the task initiation fails (NO branch of decision step 204), the node executes step 211 (the node runs as the passive node). In response to determining that the task initiation succeeds (YES branch of decision step 204), the node (as the active node) atstep 205 starts heartbeat notifications. The node (as the active node) emits periodic heartbeat notifications over communication channel 116 (shown inFIG. 1 ), indicating that the node is still alive as the active node and processing the housekeeping task. All other nodes are passive nodes in respect to the housekeeping event. The other nodes start monitoring the heartbeat notifications and detect a possible interruption of the heartbeat notifications. The interruption of the heartbeat notifications signifies a failure of the active node. - At
step 206, the node (as the active node) runs executors to process the housekeeping task. Atstep 207, the node (as the active node) completes the housekeeping task. The node makes, oncluster database 114, a persistent record of completion of the housekeeping task. Then, atstep 208, the node (as the active node) stops the heartbeat notifications. The node (as the active node) atstep 209 sends to all the other nodes (acting as the passive nodes) a notification of completion of the housekeeping task. - At
step 221, all the other nodes (acting as the passive nodes) receive the notification of completion sent by the active node atstep 209. Atstep 222, all the other nodes (passive nodes) stop monitoring the housekeeping task. - While the node becomes the passive node at
step 211, one of the other nodes will be the active node for this housekeeping task. Now, atstep 212, the node (now as the passive node) checks periodically the heartbeat notifications sent from the active node. Atstep 213, the node (now as the passive node) records a number of missed heartbeat notifications. The missed heartbeat notifications from the active node signify a failure of the active node. The failure can be caused by a number of factors, for example, network failure, overload of the active node, and of course the utter failure of the active node. Atdecision step 214, the node (now as the passive node) determines whether the number of the missed heartbeat notifications exceeds a threshold. - In response to determining that the number of the missed heartbeat notifications does not exceed the threshold (NO branch of decision step 214), the node (now as the passive node) continues to monitor the heartbeat notifications and reiterates
step 212. In response to determining that the number of the missed heartbeat notifications exceeds the threshold (YES branch of decision step 214), atstep 215, the node (now as the passive node) attempts to become the active node. The threshold reflects a certain tolerance such that the node (currently as passive node) executesstep 215 only after a certain predetermined number of the notifications have been missed. Step 215 is done via shared records oncluster database 114. If the node succeeds in marking the persistent record with its identifier, it will executestep 203, i.e, run as the active node to initiate to process the housekeeping task. Remaining nodes will assume a passive role as before. - Newly started nodes will receive heartbeat notifications of any housekeeping tasks in progress and start their own monitoring. One potential gap of restarting a housekeeping task after a take over is an inconsistency in a data model state, if a failed node has modified some of common data model (not shared in cluster database 114). A simple solution is to run a long transaction on the active node. A specific implementation may have a staged execution of a housekeeping task with safe points from which the task can continue after the take over. There is no 100% guarantee that the failed node will not come back to the task execution; for example, if a network connectivity has restored after a failure. It is possible to introduce a fail switch to notify a node that it should terminate the housekeeping task, if the node has been taken over by another node.
- It is possible to scale and parallelize execution of a housekeeping task across all nodes with minor changes. For example, if a task can be broken into smaller units, then for each one of these units another event (housekeeping or user event) is emitted on a normal event queue (or any other channel over which events are arriving). An arbitrary node picks each event, executes the appropriate task and notifies the sender node of a completion. The originating node can complete processing of the overall housekeeping task once all smaller tasks have completed or, indeed, without waiting for them to complete.
- An abstraction from that is a notion of the originating node sending multiple parallel requests for processing to a cloud of services and awaiting the responses of all of these requests. Once received, its own processing can commence. There is a possibility of a retry of a failed request, or if possible, implementation of compensating logic on remote servers.
- Since the assumption is that all nodes are deployed on a platform supporting multithreading, any node can execute multiple ordinary tasks and housekeeping tasks simultaneously. So, in an extreme case, even a single node online can perform all functions. Similarly, as many nodes as desired can be running in parallel retaining the same functionality in respect to both these types of tasks.
-
FIG. 3 is a diagram illustrating components of a node shown inFIG. 1 , in accordance with one embodiment of the present invention. It should be appreciated thatFIG. 3 provides only an illustration of one implementation and does not imply any limitations with regard to the environment in which different embodiments may be implemented. - Referring to
FIG. 3 ,node 300 includes processor(s) 320,memory 310, tangible storage device(s) 330, network interface(s) 340, and I/O (input/output) interface(s) 350. InFIG. 3 , communications among the above-mentioned components ofnode 300 are denoted bynumeral 390.Memory 310 includes ROM(s) (Read Only Memory) 311, RAM(s) (Random Access Memory) 313, and cache(s) 315. One ormore operating systems 331 and one ormore computer programs 333 reside on one or more computer readable tangible storage device(s) 330.Node 300 further includes I/O interface(s) 350. I/O interface(s) 350 allows for input and output of data with external device(s) 360 that may be connected tonode 300.Node 300 further includes network interface(s) 340 for communications betweennode 300 and a computer network. - The present invention may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
- The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device, such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
- Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network (LAN), a wide area network (WAN), and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
- Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++, and conventional procedural programming languages, such as the “C” programming language, or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry in order to perform aspects of the present invention.
- Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
- These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture, including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
- The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus, or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
- The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
Claims (18)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/712,040 US20160099999A1 (en) | 2014-10-02 | 2015-05-14 | Lightweight framework with dynamic self-organizing coordination capability for clustered applications |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/504,609 US10904111B2 (en) | 2014-10-02 | 2014-10-02 | Lightweight framework with dynamic self-organizing coordination capability for clustered applications |
US14/712,040 US20160099999A1 (en) | 2014-10-02 | 2015-05-14 | Lightweight framework with dynamic self-organizing coordination capability for clustered applications |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/504,609 Continuation US10904111B2 (en) | 2014-10-02 | 2014-10-02 | Lightweight framework with dynamic self-organizing coordination capability for clustered applications |
Publications (1)
Publication Number | Publication Date |
---|---|
US20160099999A1 true US20160099999A1 (en) | 2016-04-07 |
Family
ID=55633600
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/504,609 Expired - Fee Related US10904111B2 (en) | 2014-10-02 | 2014-10-02 | Lightweight framework with dynamic self-organizing coordination capability for clustered applications |
US14/712,040 Abandoned US20160099999A1 (en) | 2014-10-02 | 2015-05-14 | Lightweight framework with dynamic self-organizing coordination capability for clustered applications |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/504,609 Expired - Fee Related US10904111B2 (en) | 2014-10-02 | 2014-10-02 | Lightweight framework with dynamic self-organizing coordination capability for clustered applications |
Country Status (1)
Country | Link |
---|---|
US (2) | US10904111B2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108900590A (en) * | 2018-06-20 | 2018-11-27 | 郑州云海信息技术有限公司 | A kind of distributed software construction method and device |
US10904111B2 (en) | 2014-10-02 | 2021-01-26 | International Business Machines Corporation | Lightweight framework with dynamic self-organizing coordination capability for clustered applications |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109150570B (en) * | 2017-06-27 | 2022-04-08 | 阿里巴巴集团控股有限公司 | Updating method, system, end node and electronic equipment |
US10735509B2 (en) * | 2018-01-31 | 2020-08-04 | Ca, Inc. | Systems and methods for synchronizing microservice data stores |
CN109286525B (en) * | 2018-09-28 | 2022-02-25 | 昆明能讯科技有限责任公司 | Double-computer backup method based on MQTT communication and heartbeat between main and standby |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130205162A1 (en) * | 2012-02-03 | 2013-08-08 | Fujitsu Limited | Redundant computer control method and device |
US20160077936A1 (en) * | 2014-09-12 | 2016-03-17 | Facebook, Inc. | Failover mechanism in a distributed computing system |
US9444884B2 (en) * | 2011-12-31 | 2016-09-13 | Level 3 Communications, Llc | Load-aware load-balancing cluster without a central load balancer |
Family Cites Families (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001035278A1 (en) | 1999-11-10 | 2001-05-17 | Fakhouri Sameh A | A decision based system for managing distributed resources and modeling the global optimization problem |
JP4781089B2 (en) | 2005-11-15 | 2011-09-28 | 株式会社ソニー・コンピュータエンタテインメント | Task assignment method and task assignment device |
WO2007064799A1 (en) | 2005-12-01 | 2007-06-07 | Cassatt Corporation | Automated deployment and configuration of applications in an autonomically controlled distributed computing system |
US8296434B1 (en) | 2009-05-28 | 2012-10-23 | Amazon Technologies, Inc. | Providing dynamically scaling computing load balancing |
US20110161961A1 (en) * | 2009-12-29 | 2011-06-30 | Nokia Corporation | Method and apparatus for optimized information transmission using dedicated threads |
US8719415B1 (en) | 2010-06-28 | 2014-05-06 | Amazon Technologies, Inc. | Use of temporarily available computing nodes for dynamic scaling of a cluster |
US8412810B1 (en) | 2010-07-02 | 2013-04-02 | Adobe Systems Incorporated | Provisioning and managing a cluster deployed on a cloud |
US9239996B2 (en) | 2010-08-24 | 2016-01-19 | Solano Labs, Inc. | Method and apparatus for clearing cloud compute demand |
US8955097B2 (en) * | 2011-12-13 | 2015-02-10 | Mcafee, Inc. | Timing management in a large firewall cluster |
US8977637B2 (en) | 2012-08-30 | 2015-03-10 | International Business Machines Corporation | Facilitating field programmable gate array accelerations of database functions |
US9043644B2 (en) | 2012-12-04 | 2015-05-26 | International Business Machines Corporation | Using separate processes to handle short-lived and long-lived jobs to reduce failure of processes |
WO2015069912A1 (en) * | 2013-11-06 | 2015-05-14 | Improvement Interactive, LLC | Dynamic application version selection |
US10904111B2 (en) | 2014-10-02 | 2021-01-26 | International Business Machines Corporation | Lightweight framework with dynamic self-organizing coordination capability for clustered applications |
US9934272B2 (en) | 2015-10-22 | 2018-04-03 | International Business Machines Corporation | Processing a database query in a database system |
US10176147B2 (en) | 2017-03-07 | 2019-01-08 | Qualcomm Incorporated | Multi-processor core three-dimensional (3D) integrated circuits (ICs) (3DICs), and related methods |
-
2014
- 2014-10-02 US US14/504,609 patent/US10904111B2/en not_active Expired - Fee Related
-
2015
- 2015-05-14 US US14/712,040 patent/US20160099999A1/en not_active Abandoned
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9444884B2 (en) * | 2011-12-31 | 2016-09-13 | Level 3 Communications, Llc | Load-aware load-balancing cluster without a central load balancer |
US20130205162A1 (en) * | 2012-02-03 | 2013-08-08 | Fujitsu Limited | Redundant computer control method and device |
US20160077936A1 (en) * | 2014-09-12 | 2016-03-17 | Facebook, Inc. | Failover mechanism in a distributed computing system |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10904111B2 (en) | 2014-10-02 | 2021-01-26 | International Business Machines Corporation | Lightweight framework with dynamic self-organizing coordination capability for clustered applications |
CN108900590A (en) * | 2018-06-20 | 2018-11-27 | 郑州云海信息技术有限公司 | A kind of distributed software construction method and device |
Also Published As
Publication number | Publication date |
---|---|
US10904111B2 (en) | 2021-01-26 |
US20160099825A1 (en) | 2016-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10078564B2 (en) | Preventing split-brain scenario in a high-availability cluster | |
US11966307B2 (en) | Re-aligning data replication configuration of primary and secondary data serving entities of a cross-site storage solution after a failover event | |
US8799906B2 (en) | Processing a batched unit of work | |
US10474694B2 (en) | Zero-data loss recovery for active-active sites configurations | |
US10049130B2 (en) | Reliability improvement of distributed transaction processing optimizations based on connection status | |
US8875157B2 (en) | Deployment of pre-scheduled tasks in clusters | |
US9032032B2 (en) | Data replication feedback for transport input/output | |
US10904111B2 (en) | Lightweight framework with dynamic self-organizing coordination capability for clustered applications | |
US12045491B2 (en) | Resynchronization of individual volumes of a consistency group (CG) within a cross-site storage solution while maintaining synchronization of other volumes of the CG | |
US11741075B2 (en) | Methods and system of tracking transactions for distributed ledger | |
US10819641B2 (en) | Highly available servers | |
US9098439B2 (en) | Providing a fault tolerant system in a loosely-coupled cluster environment using application checkpoints and logs | |
US9940598B2 (en) | Apparatus and method for controlling execution workflows | |
US11366728B2 (en) | Systems and methods for enabling a highly available managed failover service | |
US8266301B2 (en) | Deployment of asynchronous agentless agent functionality in clustered environments | |
US20130205017A1 (en) | Computer failure monitoring method and device | |
US7870248B2 (en) | Exploiting service heartbeats to monitor file share | |
US20120278429A1 (en) | Cluster system, synchronization controlling method, server, and synchronization controlling program | |
US9026839B2 (en) | Client based high availability method for message delivery | |
US11397632B2 (en) | Safely recovering workloads within a finite timeframe from unhealthy cluster nodes | |
US9031969B2 (en) | Guaranteed in-flight SQL insert operation support during an RAC database failover | |
US10165088B2 (en) | Providing unit of work continuity in the event initiating client fails over | |
US10372542B2 (en) | Fault tolerant event management system | |
US8738959B2 (en) | Selective message loss handling in a cluster of replicated servers | |
US10394677B2 (en) | Method to efficiently and reliably process ordered user account events in a cluster |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:JOFFE, ANNA;KELSEY, HOWARD A.;LEVINE, VIKTOR;AND OTHERS;SIGNING DATES FROM 20140924 TO 20140929;REEL/FRAME:035638/0577 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: ADVISORY ACTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |