WO1999008181A1 - Procede et systeme de communication de donnees client/serveur reparti, a tolerance de pannes - Google Patents
Procede et systeme de communication de donnees client/serveur reparti, a tolerance de pannes Download PDFInfo
- Publication number
- WO1999008181A1 WO1999008181A1 PCT/US1998/015860 US9815860W WO9908181A1 WO 1999008181 A1 WO1999008181 A1 WO 1999008181A1 US 9815860 W US9815860 W US 9815860W WO 9908181 A1 WO9908181 A1 WO 9908181A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- message
- time
- processing environment
- database
- stamp
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/2097—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements maintaining the standby controller/processing unit updated
Definitions
- the present invention relates to the fields of process and system management and computer communications.
- the present invention relates to the fields of fault tolerance and reliable message passing in distributed systems.
- Computers, processors, and network elements communicate with each other by sending and receiving control and data messages.
- AIN advanced intelligent telephone network
- demands for message processing are extremely high.
- Such large networks must process millions of messages every day, with very little margin for error.
- efficient and effective message processing in such networks is a premium.
- Network elements that comprise networks are often connected to a plurality of other network elements and communicate with them in parallel, which further increases the message processing demands.
- computers, processors, and network elements typically include a plurality of message processors that process, send, and receive queued messages.
- Conventional message processors typically process messages from a first-in-first-out
- FIFO FIFO message queue. Since only one processor can access the queue at one time, problems of scale and performance occur. Adding multiple processors compounds problems because they block each other out trying to access the queue. In these conventional systems, failed messages, i.e. messages destined for a computer, processor, or network element that for some reason is not communicating, are simply placed back into the FIFO queue each time they fail. The continuous processing of these failed messages wastes significant and valuable processing time.
- data In various processing environments, data must also be replicated to alternative storage databases at various times to ensure that the data is not lost. Processors in many stringent environments, such as telephone networks, have very specific data integrity requirements, which significantly increase the replication burden.
- For asynchronous data replication data is replicated periodically from one database to another. If one database fails, data is available from the other database.
- a replication interval is defined to control how frequently replication is performed.
- the invention includes a computer system, including one or more message processors in a primary processing environment of the computer system for communicating with one or more other processing environments in the computer system.
- a message queue in the primary processing environment includes a second table to store information about each message, including a message state, a message partition, and a time-stamp; the message partition being an identifier to associate a message with a particular one of the one or more message processors; the time-stamp indicating a time each message is entered or changed.
- a process control monitor monitors processes in the primary processing environment and includes a device for periodically storing table information into the database, a device for monitoring time-stamp information of the stored table information, and a device for initiating action to correct any problems in the primary processing environment when the time-stamp information of the stored data does not correspond to a predetermined time interval.
- the invention further includes a computer processing method, including the steps of assigning a partition and time-stamp to a message, creating an entry for the message in a table, the entry including the partition, the time-stamp, and a state, selecting messages for transmission based on the partition, periodically identifying table data having a time-stamp later than a predetermined time, writing the identified table data to a first replication database, and periodically writing said identified table data from said database to a second database.
- Fig. 1 is a block diagram of a computer and queue configuration in accordance with a preferred embodiment of the present invention
- Fig. 2 is a block diagram of a message state table in accordance with a preferred embodiment of the present invention
- Fig. 3 is a block diagram of a message data table in accordance with a preferred embodiment of the present invention.
- Fig. 4 is a block diagram of a queued message table in accordance with a preferred embodiment of the present invention.
- Fig. 5 is a process flow diagram of the operation of a message controller in accordance with one embodiment of the present invention.
- Fig. 6 is a process flow diagram of the operation of a message processor in accordance with one embodiment of the present invention
- Fig. 7 is a block diagram of a computer and a table to be replicated in accordance with one embodiment of the present invention
- Fig. 8 is a table replication process in accordance with one embodiment of the present invention.
- Fig. 9 is a block diagram of a process control monitoring system in accordance with one embodiment of the present invention.
- Fig. 10 is a process flow diagram of the operation of a primary processing environment to provide process control monitoring in accordance with one embodiment of the present invention.
- Fig. 11 is a process flow diagram of the operation of a secondary processing environment to provide process control monitoring in accordance with one embodiment of the present invention.
- Fig. 1 is a block diagram of a computer 100 having a queue configuration in accordance with a preferred embodiment of the present invention.
- Computer 100 may comprise, for example, a service management system ("SMS") ("SCP") within today's ATN. SMSs are typically connected to service control points (“SCP”) by X.25 connections.
- SMS service management system
- SCP service control points
- computer 100 includes a queue 102 and a plurality of message processors 104a-104n.
- Queue 102 preferably comprises a database configuration which is connected to and communicates with the plurality of message processors 104a-104n.
- Message processors 104a-104n preferably include read/ write or send/receive processing applications for performing input/output operations between computer 100 and any one of several computers (not shown) connected to computer 100 via communication lines 108a-108n.
- Message processors from other computers may also use queue 102 to communicate with computers to which it is connected (not shown).
- queue 102 includes state table 110, queued message table 112, message data table 114, and queue controller 116.
- message data is included in message data table 114, while parameters used to optimize communications are stored in message state table 110.
- Queued message table 112 stores certain parameters also found in the message state table 110 for any failed messages, i.e. messages that could not be communicated to the intended destination computer.
- Queue controller 116 preferably includes an application for controlling queue 102 and the three tasks 110, 112, and 114 as further described herein.
- Fig. 2 is a block diagram of a message state table 110 in accordance with a preferred embodiment of the present invention. As shown, for each message to be communicated, message state table 110 preferably includes the following information: message I.D. 200, destination I.D. 202, partition 204, state 206, and time-stamp 208.
- Partition 204 links a message to one of the message processors 104a-104n.
- message controller 116 assigns each message a partition as it enters queue 102.
- message controller 116 preferably distributes messages evenly across each of the corresponding message processor 104a-104n. For example, if computer 100 includes ten message processors, for 100 messages, 10 are assigned to each message processor. This distribution is, however, configurable.
- each message processor 104a- 104n processes only messages having the corresponding partition. In this manner, message processors are not locked out and can perform parallel processing without risk of overlap.
- state 206 preferably identifies one of five message states or conditions: 1) ready for send; 2) sending; 3) processed; 4) ready for delete; 5) queued.
- the first four states are conventional communication states.
- the fifth state, "queued" specifies the state of being included in the queued message table 112, which is described in more detail below.
- Fig. 3 shows a block diagram of a message data table 114n in accordance with a preferred embodiment of the invention.
- Message data table 314 includes message I.D. 300 and data 302.
- the actual message data 302 is listed in a table 314 separate from the message state table 110.
- Message I.D.s 300 correspond to message I.D.s 200.
- Fig. 4 illustrates a block diagram of a queued message table 112 in accordance with a preferred embodiment of the present invention.
- Queued message table 112 stores messages destined for computers (not shown) that are down or for some other reason are not receiving messages.
- computer 100 determines from one or more failed communication transmissions that a connected computer (not shown) is no longer "on-line.”
- messages which cannot be transmitted by one of message processors 104a-104n are stored in the queued message table 112 until the destination machine provides an indication that it is again capable of receiving messages. At that time, state 206 of any queued message is changed from "queued" to "ready to send.”
- the queued message table 112 includes message I.D. 400, destination I.D. 402, partition 404, and time-stamp 406.
- Each of these headings corresponds to the same information of the corresponding heading in the message state table 110 as shown in Fig. 3.
- Fig. 5 is a processing flow diagram of the operation of queue controller 116 to help illustrate the methods by which messages are processed in accordance with an embodiment of the present invention.
- queue controller 116 receives a message to be transmitted to any of one or more connected machines (not shown) (step 500). Queue controller 116 then assigns the message a message I.D., a destination I.D., and a time-stamp(step 502). Queue controller 116 then assigns a partition to the message, as discussed above (step 504). Having assigned this information to the message, queue controller 116 assigns the message I.D. 300 and data 302 to the message data table 114 (step 506), and builds an entry for the message state table 110 (step 508).
- Queue controller 116 then inquires whether the destination computer for that message is "on-line" (step 510). If not, it builds an entry for the queued message table 112 (step 512) and assigns the message a "queued" state in the message state table (step 514). If, however, the destination machine is "on-line,” queue controller 116 assigns the message a "ready for send” state (step 516). Queue controller 116 then inquires whether any "off-line” destination machines have come back "on-line” (step 518). If not, processing is complete. If so, however, queue controller 116 changes the state of those messages intended for the back "on-line” destination computers to "ready for send” (step 520), before returning to process a new message.
- Fig. 6 is a process flow diagram of the operation of any one of message processors
- a message processor 104 scans the message state table 110 and selects the first message with a state "ready to send" (step 600). Message processor 104 then sends that message to the destination machine (step 602) and changes the message state to "processed” (step 604). Message processor 104 then waits for a response from the destination machine (step 606). If it receives a response within a predetermined time period, message processor 104 changes the message state to "delete" (step 608). The message can then be removed from the queue 102.
- the predetermined time period is configurable and is preferably five seconds. If the message processor 104 does not receive a response from the destination machine within the predetermined time period, it executes process steps like those described in steps 510-514 in Fig. 5.
- queue controller 116 and message processors 104-104a comprise software applications programmed to execute the corresponding functions described herein.
- these elements may comprise any form of conventional hardware processor or controller, independent or otherwise, or any combination of hardware and software.
- Fig. 7 is a block diagram of a processing environment requiring a table to be replicated in accordance with one embodiment of the present invention.
- Processing environment 700 may correspond to any computer, processor, or network component that processes information, builds tables, and replicates those tables to a database, or any other form of data replication environment.
- Such a processing environment may exist, for example, in a reliable message passing system of the telephone network, including, e.g. communications between a service management system and other network elements for "800 services.”
- processing environment 700 may correspond to the computer 100 and table 706 may correspond to any of tables 110, 112, and 114 shown in Fig. 1.
- processing environment 700 includes a processor 702, a replication database 704, and a table 706.
- Replication database 704 corresponds to any suitable storage configuration for storing a replication table 705 corresponding to table 706.
- processor 702 builds a table 706 in processing environment 700 by adding, updating, and deleting data as necessary.
- table 706 includes an additional "time- stamp" column 708.
- Each time processor 702 writes an entry to table 706 or performs an operation affecting a table entry in table 706, processor 702 time-stamps that operation by inserting the time of the operation under the time-stamp column 708.
- Fig. 8 provides a processing flow diagram of a replication process in accordance with one embodiment of the present invention.
- processor 702 determines whether a replication cycle has expired (step 800). If not, it waits for a predetermined period (step 802) and returns to the initial step 800.
- the replication cycle preferably corresponds to a preselected time period for writing to the replication table in replication database 704. In a preferred embodiment, this replication cycle is chosen to be ten seconds; however, the replication cycle is configurable and can be selected to be any predetermined time period depending on the application.
- processor 702 determines that the replication cycle has expired in step 800, it next checks the time of the last replication (step 804). Against that information, processor 702 determines whether the time-stamp of the first table entry is later than the last replication time (step 806). If so, processor 702 marks that table entry for replication (step 808). If the time-stamp of the table entry being checked is not later than the last replication time, processor 702 determines whether any table entries are left (step 810). Likewise, after processor 702 marks a particular table entry for replication, it checks for any table entries left (step 810). If a table entry still exists, processor 102 returns to step 806 and checks through each existing table entry.
- processor 702 replicates each marked table entry by copying those marked table entries to the replication table 705 in replication database 704 (step 812).
- Processor 702 then updates the stored replication time (step 814) and returns to the beginning of the process. In this manner, the present invention need not replicate an entire table of data.
- the process replicates only table entries that have been modified in some way since a last replication cycle.
- Fig. 9 is a block diagram of a process control monitoring system in accordance with one embodiment of the present invention.
- a process control monitoring system in accordance with one embodiment of the present invention is distributed across a primary processing environment 902 and a secondary processing environment 904.
- Processing environments 902 and 904 may correspond to a computer, processor, or network component that processes information and replicates that information.
- primary processing environment 902 may correspond to the computer or processing environments shown in Figs. 1 or 7 and described above.
- Primary processing environment 902 includes a plurality of processes or applications 906a-906n, a primary process control monitor (“PCM”) 908 is connected to each of the processes 906, and a shared memory 910.
- PCM 908 monitors processes 906 and determines whether each should be running, brought- up, and/or shut-down. It further detects which processes are "hung-up.” For example, at predetermined time intervals, each process 106 writes status information to primary PCM 908, which then stores the status information in shared memory 910.
- the information in shared memory 910 can be used to monitor processes 906 and other applications (not shown).
- the content of shared memory 910 is periodically replicated to primary database 912. This replication process may be performed as described above with regard to Figs. 7 and 8.
- secondary processing environment 904 includes secondary PCM 914 for monitoring individual processes 916a-916n. Secondary processing environment 904 also includes a shared memory 918 for storing any process data from processes 916a-916n. In addition, secondary processing environment 914 is connected to a secondary database 920 to replicate data for efficiency and fault tolerance.
- Secondary PCM 914 differs from primary PCM 908, however, in accordance with one embodiment of the invention, in that secondary PCM 914 monitors primary processing environment 902 at the process or application level. As described in more detail below, in accordance with the present invention, process data stored in primary database 912 is periodically replicated to secondary database 920. Secondary PCM 914 uses this replicated data to monitor the process performance of primary processing environment 902 and take over processing where necessary.
- Fig. 10 is a process flow diagram of the operation of primary processing environment 902 to provide process control monitoring in accordance with one embodiment of the present invention.
- primary PCM 908 monitors the operation of processes 906 at a predetermined interval. Thus, primary PCM 908 initially determines whether monitoring interval T, has expired (step 1000). If not, the primary PCM 908 continues to monitor. If time Ti. has expired, primary PCM 908 checks each process (step 1002) and determines whether it is malfunctioning (step 1004). If a malfunction exists, primary PCM 908 preferably restarts or corrects the process (step 1006). If no malfunctions exist, primary PCM 908 determines whether it is time to replicate the shared memory 910 data to primary database 912.
- primary PCM 908 determines whether a preselected replication time interval T 2 has expired (step 1008). If not, primary PCM 908 continues to wait. Once T 2 has expired, primary PCM 908 writes the process data from shared memory 910 to primary database 112 (step 1010).
- this replication process writes all data to primary database 912.
- primary PCM 908 only writes predetermined portions of the data.
- the replication process may be based on time-stamped data as described above.
- primary PCM 908 determines whether it is time to replicate data from primary database 912 to secondary database 920. Specifically, primary PCM 908 determines whether a third preselected time interval T 3 has expired.
- T 3 is configurable and is preferably selected to ensure accuracy in a fault-tolerant design depending on the system or network configuration and corresponding application. For example, in a telecommunication environment such as the telephone network where certain standards require very strict fault- tolerance, this time period T 3 would be relatively short, for example twenty seconds.
- primary PCM 908 determines that T 3 has not expired, it continues other steps of its normal processing.
- primary PCM 908 determines that time T 3 has expired, it replicates the primary database information to the secondary database (step 1014).
- this data replication process is also performed based on the time- stamped data, much like the data replication between shared memory 910 and primary database 912. In other words, only data whose status has been changed or updated since the beginning of the replication interval T 3 is replicated from primary database 912 to secondary database 920.
- secondary PCM 914 uses the replicated data stored in secondary database 920 to monitor the processes of the primary processing environment 902. For example, secondary PCM 914 may monitor the time-stamp information corresponding to each operation. If the replicated data includes time-stamps corresponding to the most recent replication interval T 3 , then processes 906 of primary processing environment 902 are functioning properly and the replication process is functioning properly. However, if the time-stamps are old, then a problem exists in either the processes 906 or the replication process.
- Fig. 11 is a process flow diagram of a process executed by a secondary PCM 914 to provide process control monitoring in accordance with one embodiment of the invention.
- Secondary PCM 914 periodically monitors the time-stamps of data replicated to secondary database 920.
- the time interval T 4 for this monitoring step is also configurable, and again, in networks or systems requiring strict fault tolerance, this interval would be shortened. For example, in telephone networks, the interval might be every twenty seconds.
- Secondary PCM 914 initially determines whether the time interval T 4 has expired (step 1100). If not, it continues to monitor. If time interval T 4 has expired, secondary PCM 914 checks the time-stamps on the replicated data in secondary database (step 1102). Secondary PCM 914 then determines whether any time-stamps correspond to the most recent replication interval T 3 (step 1104). If they are, the processes 906 and the replication process are working properly and secondary PCM 914 continues normal monitoring until the next interval T . However, if the secondary PCM 914 determines that none of the time-stamps correspond to the most recent replication interval T 3 , then a problem exists in either the processes 906 or the replication process.
- secondary PCM 914 determines whether the primary processing environment 902 is still "alive" (functioning properly and/or on-line) (step 1106). If it is, secondary PCM 914 preferably requests help (step 1108). Because primary processing environment 902 is still alive but the updated information is not accurate, certain problems can be presumed. For example, certain processes 906 may be down. In one embodiment, secondary PCM 914 may take steps to automatically reinitialize various processes 906 based on these presumptions or based on various data from additional diagnostic tools or initialize certain automatic diagnostic procedures. Alternatively, secondary PCM 114 may provide an alarm indication and request manual intervention to further diagnose problems at the primary processing environment 902.
- Secondary processing environment 904 takes over the processes of primary processing environment 902 (step 1110).
- Secondary processing environment 904 is preferably configured to include the same processes and functionality as primary processing environment 902 for fault tolerance and backup purposes.
- primary processing environment 902 goes down, secondary processing environment 904 initializes the processes or applications necessary and substitutes itself and the corresponding processes for that of the downed primary processing environment 902.
- secondary processing environment 904 continues performing the processes of primary processing environment 102 until primary processing environment 902 comes back on line. In this manner, fault tolerance is highly improved as both processes and machine state are ultimately monitored.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer And Data Communications (AREA)
Abstract
L'invention concerne un environnement de traitement comportant un système de mise en file d'attente (102), un procédé permettant la communication de messages entre des ordinateurs, des dispositifs de traitement ou des éléments de réseau. Une configuration de tables (110, 112, 114) et des procédés au sein de la file d'attente permettent la réplication des données des tables, en fonction des informations de marquage temporel (908). Un système et un procédé de surveillance de commande de procédé (902) permettent une autre réplication des données des tables dans une deuxième base de données (920) et de contrôler l'environnement de traitement (700) en fonction des données encore répliquées.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US90775297A | 1997-08-08 | 1997-08-08 | |
US08/907,752 | 1997-08-08 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1999008181A1 true WO1999008181A1 (fr) | 1999-02-18 |
Family
ID=25424583
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1998/015860 WO1999008181A1 (fr) | 1997-08-08 | 1998-07-30 | Procede et systeme de communication de donnees client/serveur reparti, a tolerance de pannes |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO1999008181A1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP4365158A1 (fr) | 2022-11-04 | 2024-05-08 | PCC ROKITA Spolka Akcyjna | Procédé de préparation sélective de paradichlorobenzène avec récupération améliorée du système catalytique |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5608720A (en) * | 1993-03-09 | 1997-03-04 | Hubbell Incorporated | Control system and operations system interface for a network element in an access system |
-
1998
- 1998-07-30 WO PCT/US1998/015860 patent/WO1999008181A1/fr active Application Filing
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5608720A (en) * | 1993-03-09 | 1997-03-04 | Hubbell Incorporated | Control system and operations system interface for a network element in an access system |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP4365158A1 (fr) | 2022-11-04 | 2024-05-08 | PCC ROKITA Spolka Akcyjna | Procédé de préparation sélective de paradichlorobenzène avec récupération améliorée du système catalytique |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7814050B2 (en) | Disaster recovery | |
KR100575497B1 (ko) | 내고장성 컴퓨터 시스템 | |
US5247664A (en) | Fault-tolerant distributed database system and method for the management of correctable subtransaction faults by the global transaction source node | |
EP0062463B1 (fr) | Systèmes à calculatrices et processeurs de commande | |
US6868442B1 (en) | Methods and apparatus for processing administrative requests of a distributed network application executing in a clustered computing environment | |
US5005122A (en) | Arrangement with cooperating management server node and network service node | |
EP0481231B1 (fr) | Procédé et système d'augmentation de la disponibilité opérationnelle d'un système de programmes d'ordinateur opérant dans un système distribué d'ordinateurs | |
US7392421B1 (en) | Framework for managing clustering and replication | |
EP0467546A2 (fr) | Systèmes de traitement de données distribuées | |
US5600791A (en) | Distributed device status in a clustered system environment | |
US6789101B2 (en) | Automation system uses resource manager and resource agents to automatically start and stop programs in a computer network | |
US5960178A (en) | Queue system and method for point-to-point message passing having a separate table for storing message state and identifier of processor assigned to process the message | |
US5442785A (en) | Method and apparatus for passing messages between application programs on host processors coupled to a record lock processor | |
US20080288812A1 (en) | Cluster system and an error recovery method thereof | |
EP0954918B1 (fr) | Systeme et procede de gestion des communications et des processus dans un commutateur de telecommunications reparti | |
US20030163735A1 (en) | Processing takeover method in multiple computer system | |
US6427213B1 (en) | Apparatus, method and system for file synchronization for a fault tolerate network | |
WO1999057620A2 (fr) | Repartition d'une demande de service | |
US5583986A (en) | Apparatus for and method of duplex operation and management for signalling message exchange no. 1 system | |
WO1999008181A1 (fr) | Procede et systeme de communication de donnees client/serveur reparti, a tolerance de pannes | |
WO1999008190A1 (fr) | Systeme et procede de surveillance de commande de processus | |
JP3483931B2 (ja) | プログラム実行制御方法および計算機 | |
Swanson et al. | MVS/ESA coupled-systems considerations | |
KR0123193B1 (ko) | 분산 실시간 시스템에서 데이타베이스의 백업 금지, 허용 및 상태 출력 방법 | |
JPH06139213A (ja) | 計算機システム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): CA JP |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
NENP | Non-entry into the national phase |
Ref country code: JP Ref document number: 1999512190 Format of ref document f/p: F |
|
NENP | Non-entry into the national phase |
Ref country code: CA |
|
122 | Ep: pct application non-entry in european phase |