US8195613B2 - Transactional archiving of an electronic document - Google Patents
Transactional archiving of an electronic document Download PDFInfo
- Publication number
- US8195613B2 US8195613B2 US12/536,823 US53682309A US8195613B2 US 8195613 B2 US8195613 B2 US 8195613B2 US 53682309 A US53682309 A US 53682309A US 8195613 B2 US8195613 B2 US 8195613B2
- Authority
- US
- United States
- Prior art keywords
- electronic document
- transaction
- archive
- archival
- units
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related, expires
Links
- 238000000034 method Methods 0.000 claims abstract description 137
- 238000013500 data storage Methods 0.000 claims abstract description 22
- 230000005540 biological transmission Effects 0.000 claims abstract description 14
- 230000008569 process Effects 0.000 claims description 21
- 230000000903 blocking effect Effects 0.000 claims description 18
- 230000015654 memory Effects 0.000 claims description 14
- 238000012546 transfer Methods 0.000 claims description 9
- 230000002085 persistent effect Effects 0.000 claims description 8
- 230000008859 change Effects 0.000 claims description 4
- 230000000977 initiatory effect Effects 0.000 claims description 3
- 238000004891 communication Methods 0.000 claims 2
- 238000013507 mapping Methods 0.000 claims 2
- 238000012544 monitoring process Methods 0.000 claims 1
- 239000000523 sample Substances 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 10
- 239000011159 matrix material Substances 0.000 description 4
- 230000008520 organization Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000005291 magnetic effect Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/11—File system administration, e.g. details of archiving or snapshots
- G06F16/113—Details of archiving
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1474—Saving, restoring, recovering or retrying in transactions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/1865—Transactional file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2379—Updates performed during online database operations; commit processing
Definitions
- Embodiments of the invention generally relate to archiving of an electronic document. More particularly, an aspect of an embodiment of the invention relates archiving of an electronic document in archival units in different locations.
- a set of transaction manager applications cooperating on archival units in different locations may perform a two-phase commit protocol to archive electronic documents.
- the electronic document may be archived between multiple interconnected archive units of a distributed server network in geographically-dispersed locations in order to store identical copies of the electronic document at the same time in two or more of the archive units.
- the transaction manager application causes archiving portal in the distributed server network to send a five-step, two-phase commit protocol to a selected set of transaction manager instances resident on the remote archive units (smart cells).
- the transaction manager application monitors a persistent state of the archiving of the electronic document.
- a transaction manager application assigns a unique ID to each electronic document and to each transaction in the temporary memory data storage location based on receiving a begin method of the two-phase commit protocol from a transaction manager resident on the archiving portal.
- the transaction manager application on archiving portal sends the same document in parallel to the transaction manager instances on the archival units (smart cells).
- Transaction managers running on the remote archive units validates that the document is uncompromised and completes and stores the electronic document in a temporary location.
- the transaction manager application reconciles the archival system if an error occurs between a start of a transmission of the electronic document and a permanent archiving of that electronic document.
- the transaction manager application then stores the electronic document in a permanent data storage location in each of the smart cells at an end of the two-phase commit protocol. This approach ensures atomic transactions and auto-recovery in case of a failure.
- FIG. 1 illustrates a block diagram of the cloud that shows data flow through archiving portals and archiving of the electronic documents in a multiple interconnected archive units, each implementing an instance of the server transaction manager application;
- FIG. 2 illustrates a diagram of the physical organization of the archive unit including the permanent storage location
- FIG. 3 is a flow diagram depicting an example electronic document storage operation integrated with the two-phase commit protocol
- FIG. 4 illustrates a sequence diagram for an embodiment of the two-phase commit protocol
- FIG. 5 illustrates a Client class diagram of an embodiment of the transaction manager.
- FIG. 6 illustrates a table for an embodiment of the transaction manager and its abandoned transaction resolution matrix.
- a set of transaction manager applications cooperating on archiving portals and archival units in different locations may perform a two-phase commit protocol to archive electronic documents.
- the electronic document may be archived between multiple interconnected archive units of a distributed server network in geographically-dispersed locations in order to store identical copies of the electronic document at the same time.
- the transaction manager application causes an archiving portal in the distributed server network to send a five-step, two-phase commit protocol to a selected set of transaction manager instances resident on remote archive units (smart cells).
- the transaction manager application monitors a persistent state of the archiving of the electronic document.
- the archiving transaction of the electronic document is initiated with a begin method invocation on the archiving portal.
- Implementation of the begin method creates new, globally unique document ID and transaction ID. Once the archiving portal has successfully allocated a document ID and transaction ID the transaction manager resident on the archiving portal will invoke a method to archive electronic documents and associated meta-data on the set of remote archival units. Implementation of the archive method on the archiving portal will invoke the remote archive method on all of the selected archival units in parallel.
- the transaction manager resident on the archiving portal uses a custom remote method invocation protocol that allows invocation of the remote command while attaching the data from the electronic document.
- the transaction manager application stores the electronic document in a temporary data storage location in each of the selected archive units (smart cells) during the two-phase commit protocol.
- the transaction manager application reconciles the archival system if an error occurs between a start of a transmission of the electronic document and a permanent archiving of that electronic document.
- the transaction manager application then stores the electronic document in a permanent storage location at an end of the two-phase commit protocol.
- the archival system allows uninterrupted access from a client machine to at least one of the copies of the archived electronic document in the archive units of the distributed server network even if one archive unit location is not accessible.
- the transactional archiving of the electronic document conveys archiving of the same data at the same time in two or more eventual permanent storage locations that are located in geographically different locations.
- Transactional archiving of the electronic document differs from data replication, which may only be generating a single copy of the data to be stored in a back up permanent storage location.
- the permanent storage location may be a file system including a data storage device where the data 1) remains stored even if the power is lost (similar to non-volatile systems) and 2) the data cannot be over written (similar to one time programmable and FLASH memories)
- FIG. 1 illustrates a block diagram of multiple interconnected archives, each implementing an instance of the server transaction manager application.
- An archive portal server 101 may receive a new electronic document from a customer's server.
- the transaction manager 150 on the archive portal server 101 may transactionally archive the incoming electronic document on two separate archive units 102 a and 102 b at the same time.
- Each document is sent to a pair of archival units 102 a and 102 b called smart cells running instances of transaction managers 150 a and 150 b .
- the electronic document is always archived to both cells 102 a and 102 b in a pair or to none at all.
- the electronic document is permanently archived to two or more smart cells or none at all. Once document is archived, the document can be independently retrieved from either cell 102 a and 102 b.
- the transaction manager application such as transaction manager 150 , resident on the archiving portal 101 initiates the archiving transaction.
- Each server transaction manager application 150 a and 150 b is configured for transactional archiving of an electronic document between the multiple interconnected archive units 102 a , 102 b of the distributed server network in geographically-dispersed locations in order to store identical copies of the electronic document at the same time, monitor the persistent state of the archiving of the electronic document, and reconcile the archival system if an error occurs between a start of a transmission of the electronic document and a permanent archiving of that electronic document.
- the archive units 102 a , 102 b of the distributed server network allow uninterrupted access from a client machine to at least one of the copies of the archived electronic document even if one archive unit location is not accessible.
- the transaction manager application 150 resident on the archiving portal 101 is coded to broadcast a five-step, two-phase commit protocol to a selected set of the other transaction manager instances resident on remote archive units, such as a first archive unit 102 a and a second archive unit 102 b .
- the server transaction manager application is also coded to reconcile the multiple interconnected archive units 102 a and 102 b if an error occurs between the start of the transmission of the electronic document and the permanent archiving of that electronic document in any of the multiple interconnected archive units 102 a and 102 b.
- an instance of the transaction manager resident on the archival server 102 a is coded to monitor and resolve archival transaction state and to restore consistency to geographically dispersed archival units in the event an error occurs during the archiving transaction.
- the transaction state always provides enough information for the transaction manager to resolve the transaction that was still in progress once server and/or transaction manager process has been restarted. See FIG. 6 for an example table maintained by a transaction manager and its abandoned transaction resolution matrix. Referring to FIG. 1 , once transaction manager 102 a starts it will poll for abandoned transactions. Transactions are considered abandoned if the transaction state is not changing for a very long time.
- Exact amount of time required to mark transaction as abandoned is a configuration parameter of the transaction manager.
- Status of the abandoned transactions is resolved by getting the state of the transaction from all archiving units that participated in transaction.
- Transaction can be in one of the following states: INITIATED, ACTIVE, PREPARED, ROLLBACK_ONLY.
- INITIATED When the transaction begins, transaction has the state INITIATED.
- Invocation of the archive method on the archiving units 102 a and 102 b will move the transaction status to ACTIVE state if invocation was successful or ROLLBACK_ONLY if invocation fails.
- Invocation of the prepare method on the archival units 102 a and 102 b will change the state of the transaction to PREPARED state if invocation was successful or to ROLLBACK_ONLY if invocation fails.
- Transaction can be committed only if it is in a state PREPARED. If the transaction is in the state ROLLBACK_ONLY, it can only be rolled back.
- the table in FIG. 6 shows an example matrix used to determine if the abandoned transaction should be committed or roll backed.
- Each archiving portal and archival unit communicates with the other remote archival units via the network protocol connection, such as a TCP/IP network connection 112 , at a data link layer and executes one stream between the two instances of the transaction managers during the archiving of the copies of the electronic document.
- the stream includes commands and attached attachments, which include a metadata attachment and the data of the electronic document as the other attachment.
- a leader the transaction manager 150 on the archiving portal server 101 , exists for each transaction to coordinate the two-phase commit protocol for this process and is commonly referred to as transaction coordinator.
- This transaction manager selects a set of transaction managers on the archiving units to receive the electronic document. Transaction managers in this set are called transaction participants.
- the relevant transaction managers communicate among themselves to execute the two-phase commit protocol schema below, and archive electronic document.
- the transaction managers ensure reduced archiving cost at unparalleled data fidelity by use of the two-phase commit protocol with no impact on archiving performance.
- the transaction managers ensure that the archiving system is always consistent and that an exact copy of an electronic document is permanently archived at two or more archival units in geographically-dispersed locations in the multiple interconnected archive units.
- the transaction manager is coded to leave the archival system in consistent state by always knowing the state of the archival transaction of the electronic document.
- the transaction manager implements 1) a commit method of the two-phase commit protocol to write the electronic document in permanent storage at the end of the archival transaction process and 2) a rollback method to rollback the archive units and remove electronic document from the temporary location, and communicate that an error occurred in the archival transaction process to the transaction manager instance initiating the two-phase commit protocol in order to restore consistency among the geographically dispersed archival units.
- the transaction managers give the ability to have the identical instances of electronic document data copies in geographically-distributed locations at no additional hardware cost, while still always leaving the archiving of the electronic document across each of the archival units in consistent state.
- the transaction managers give the ability to have data copies in multiple geographically-distributed locations atomically.
- An atomic commit may be an operation in which a set of distinct changes is applied as a single operation.
- the electronic document with its associated metadata archived in the multiple locations must be an exact mirror copy of the original data. This way, the transaction manager gives the ability to preserve the data and allow seamless access to data even in the event that not only one disk, or one node fails, but when the whole data-center is destroyed.
- the transactional archiving with the transaction manager provides at least four times better protection for customer data than simply implementing a RAID system on NAS (RAID 6) because the multiple archival servers are in geographically-dispersed locations and the transaction manager works in the background to ensure that the archiving of the electronic document across each of the archival units is an atomic operation.
- the transaction manager gives the ability to have reliable transactional electronic document archiving in a distributed server network that is not mirrored, therefore simplifying infrastructure and greatly reducing the cost of archiving, while maintaining the fidelity of the data.
- the transaction manager works in the background between the archiving sites to maintain the consistency of the stored and replicated documents.
- the archive units 102 a and 102 b may employ a split-cell architecture for permanent storage of the electronic documents.
- the split-cell architecture for permanent storage is a grid based paradigm that is infinitely scalable, maintaining performance under any load.
- the two or more fully synchronized archive units, geographically separated, provide complete data and system redundancy and parallel processing of all tasks.
- the design is uniquely qualified to support the performance and volume requirements that are necessary for processing the rapidly expanding number of electronic documents including all forms of textual, audio and video mail and attachments.
- FIG. 3 is a flow diagram depicting an example electronic document storage operation integrated with the two-phase commit protocol.
- the process begins with a message setup exchange 302 between customer client machine, such as a customer SMTP server and the local archiving portal server.
- customer client machine such as a customer SMTP server
- the local archiving portal server Next, the electronic document to be archived is transferred from the customer SMTP server to the archiving portal server as an email 304 .
- the archive server parses the electronic document to extract metadata for the electronic document.
- a structure called “bitfile” is created for the electronic document itself.
- the bitfile represents compressed electronic document, and a signature used to verify the authenticity of the electronic document.
- Another file containing the associated meta data is also created at 208 .
- the transaction manager on the archiving portal server sends the electronic document bitfile and metadata file as attachments to the archive method to two or more archive units at the same time for permanent storage at 306 .
- the transaction manager resident on this archival portal server sends this command 306 to other instances of transaction managers' resident on remote hosts via the two-phase commit protocol.
- the two-phase commit sequence is described in more detail in FIG. 4 below.
- the transaction managers on the archive units send an acknowledge message to the archive SMTP server at 308 .
- the archive SMTP server sends an OK message to the customer SMTP server at 312 .
- the transaction manager resident on the archiving portal using the two-phase commit method gives the ability to attach an arbitrary amount of data to the invocation of the two-phase commit method on the remote archival host with no/minimal memory overhead (Command Attachments) by streaming the data in frames through the same socket connection during the archiving process.
- the transfer of the electronic document between the archival server units is not, per se, being stored in a temporary memory buffer in its entirety before the remote host starts acting on the received data rather the bytes representing electronic document are streamed between the units from a transaction manager on the archiving portal to the other transaction managers on remote archival servers as a command with attachments.
- the command with attachments is streamed as sequence of data elements (frames).
- the server transaction manager application listening at the non-blocking socket of the server asks the object in the streaming session, do you have an attachment? If so, forward all data following the object to its attachment as long as the frames in which data is coming over the socket are full.
- the streaming process encounters a frame partially full or a zero length frame, it initiates a method on the attached object that indicates end of the attachment.
- the electronic document attachment is broken into frames to carry that data.
- the command typically has two attachments but could indicate how many attachments in its header to the server application.
- the non-blocking socket of the server is always kept open as well as the client socket on the archiving portal.
- the non-blocking server socket receives all of the frames full of data and the server transaction manager application listening at the non-blocking socket of the server notes when an end frame in that sequence of streamed data is not full. This indicates to the server transaction manager application that the transmission of the attachment is complete. A check sum occurs to verify the integrity of data in the attachment.
- FIG. 4 illustrates a sequence diagram for an embodiment of the two-phase commit protocol.
- Each instance of the transactional manager resident on its local archival server is coded to implement the following five-step, two-phase commit protocol for a new received electronic document.
- the following five methods make up an example two-phase commit protocol consisting of: the Begin method 460 , the Archive method 462 , the Prepare method 464 , the Commit method 466 , and the Rollback method 468 .
- the distributed nature of trying to keep the archival states consistent among the archival site requires an implementation of the remote command execution framework (framework for executing the commands/methods of the two-phase commit protocol in parallel on multiple remote machines/archival units,) as the lowest layer in the application stack with the aim of satisfying performance requirements on a commodity hardware platform.
- This remote command execution framework is based on the use of non-blocking sockets of the server and allows transfer of objects (commands and commands with attachments) across the network and invocation of the remote command on the remote server. Objects are serialized as they are transferred across the network, but there is no additional overhead, like additional client threads, required to execute a method on the remote host.
- the new Remote Command Execution framework supports the unique ability to attach the data (electronic document and associated metadata) to an invocation of the method on the remote host (Command Attachments).
- Data transfer is an atomic operation as far as the remote command execution is concerned.
- the command has an attachment
- Command attachments in the archive method of the two-phase commit protocol are streamed across the network since streaming allows data transfer with a very low memory footprint.
- Remote commands with attachments are the bases of the transactional archiving.
- the non-blocking socket does not close when the transaction is complete, instead the socket is kept open.
- the non-blocking socket does not need to cycle thru closing and reopening for the next command or command with attachment.
- the transaction manager application listening at the non-blocking socket merely recognizes when a transmission of an attachment is complete when the last frame is not full with data or has zero length. Other systems sometime need the non-blocking socket to close to tell when the transaction is complete.
- One stream is used to communicate both commands and the attached data (i.e. electronic document and associated metadata attachments).
- the archiving transaction of the electronic document is implemented using the two-phase commit process modified for the purpose of electronic document archiving.
- one additional thread is started to execute remote commands on two different remote hosts in parallel. Every archiving command is automatically split in multiple remote commands inside a framework to satisfy the distributed nature of the system.
- Transactional archiving requires each instance of the transaction manager resident on the archival server to implement the following unique five step two-phase commit methods: Begin method 460 ; Archive method 462 ; Prepare method 464 ; Commit method 466 ; and Rollback method 468 .
- the transaction manager keeps track of the state of archiving transactions of the electronic documents, understands the current state of the archival process/archiving transaction persistent state so the transaction manager can either commit the archiving of the electronic document to a permanent storage location, or if an error occurs in this archiving process rollback the state of the archiving system to remove the electronic document from temporary storage location, and communicate that an error occurred in the archival process to the local transaction manager instance initiating the two-phase commit protocol in order to restore consistency among the geographically dispersed archival units.
- the transactional manager resident on archiving portal is coded to implement the following five-step, two-phase commit protocol for a newly received electronic document and the distributed archival transaction across remote archival units occurs in parallel.
- the transactional manager resident on its archiving portal receiving the new electronic document cooperates with the other instances of the transactional manager resident on remote archival units to implement the following five methods of the two-phase commit protocol.
- the two-phase commit protocol starts with a begin method 460 to assign an ID to the archival transaction, then changes are performed, document and metadata are transferred to the remote archival units (smart cells) and the archival transaction is prepared for permanent storage in each of the remote archival units and once all transaction participants confirm that transaction preparation was successful, then the archival transaction of the electronic document is committed in the last method executed in the two-phase commit protocol.
- the distributed archival transaction across remote archival hosts occurs in parallel. Transaction starts with “begin”, then changes are performed and transaction is prepared. Once all transaction participants confirm that transaction preparation was successful, then the transaction is committed in the last method executed. Once transaction is committed, the archiving of the electronic document cannot be rolled back.
- archiving transactions that are left in progress after system crash are automatically resolved upon system restart.
- the transaction manager running on the server is capable of, not only executing remote commands but, transferring files as well and is responsible for resolving archival transactions left in progress after crash or restart.
- the ‘begin’ method 460 of the two-phase commit protocol starts/initiates the archiving transaction and assigns a globally-unique transaction ID to the new archiving transaction.
- the ‘archive’ method 462 is a command that carries attachments, such as two attachments.
- the first attachment represents a “bitfile” containing a compressed copy of the original electronic document and a signature.
- the second attachment is a compressed copy of the additional meta-data associated with the electronic document.
- the electronic document attachment and its meta-data attachment are placed/written into the temporary directory on the remote hosts, while the remote command initiates an archiving transaction by creating a transaction log entry and validating the integrity, i.e. md5sum, of the transferred attachments.
- the md5sum may be a computer program that calculates and verifies 128-bit MD5 hashes, as described in RFC 1321.
- the MD5 hash (or checksum) functions as a compact digital fingerprint of a file. Because almost any change to a file will cause its MD5 hash to also change, the MD5 hash is commonly used to verify the integrity of files (i.e., to verify that a file has not changed as a result of file transfer, disk error, meddling, etc.).
- the transactional manager is coded to listen at a non-blocking socket for a command and attachment carried by frames.
- the frame is an array of bytes that has pre-determined size.
- the frame not being full of data indicates that the transmission of the attachment to the command is complete.
- the electronic document attachment can be broken into frames to carry that data.
- the command typically has two attachments but could indicate how many attachments in its header to the server application.
- the non-blocking socket is always kept open.
- the server non-blocking socket receives the frame full of data and notes when an end frame in that sequence of streamed data is not full. This indicates to the server transactional manager application listening at the non-blocking socket that the transmission of the attachment is complete. A check sum can occur to verify the integrity of data in the attachment.
- One of the arguments of the remote command is the checksum originally calculated on the archival portal that is sending the document. This value is then compared to the newly calculated value on the receiving side on each smart cell to ensure that the document arrived unchanged.
- Server assumes streaming of attachment is complete by reading when last frame in sequence associated with command is not full of data/complete.
- the attachment of the electronic document and metadata are stored in temporary location.
- the transaction status at this point is “active” and the transaction is added to the list of transactions in progress.
- This Archive method 462 must be successful on all remote hosts/archival units for the transaction to proceed.
- the client calls the archive method 462 to archive electronic document.
- the implementation of this archive method 462 will first pick a set of archiving units (smart cell pair) as a target for the archive method. Cell pair is picked using Weighted Round Robin mechanism from the list of registered objects in JNDI. Once cell pair is picked, archive will transfer bitfile and associated metadata to both cells in a cell pair and initiate archive transaction on the server. Implementation of the archive method 462 will pass checksum of the bitfile and metadata as the arguments to the invocation of the remote archive method. Remote method is invoked through custom Remote Method Execution framework and bitfile and metadata are transferred as command attachments.
- the archive method 462 When the archive method 462 is invoked on the server, it will validate the checksum of the received files ensuring that they are unchanged as they are stored in the temporary location. Next, primary cell will allocate permanent location for the transferred bitfile and metadata. All this information will be recorded in the persistent location on the archival units (smart cells). Transaction state at this point is ACTIVE. If for whatever reason checksum calculated on the sending side does not match the checksum calculated on the receiving side, transaction manager on the archival units will raise an exception. This will cause the whole archiving transaction to be rolled back.
- Invocation of the archive method 462 is happening in parallel on all selected archival units. This feature allows for best performance since the time is spent in network interaction to two servers in parallel. Method returns when both threads have completed.
- the next call in the sequence is a ‘prepare’ method 464 , which prepares the archiving transaction of the electronic document for the final commit on all remote hosts/archival units.
- the prepare method 464 prepares the move/propagation of the electronic document that has passed its check sum along with its metadata from the temporary storage location into a permanent archival storage location.
- the prepare method 464 assigns the final path for the electronic document and metadata, that were attachments, in the archive as well as changing the transaction status to “prepared”. This prepare method 464 must be successful on all remote hosts for the transaction to proceed.
- the final method in the archive transaction is the ‘commit’ method 466 .
- Remote invocation of this commit method 466 moves the electronic document and its metadata on all remote hosts to their permanent location inside the archive, and deletes the transaction from the list of transactions in progress. Once committed, an archival transaction cannot be rolled back.
- the commit method 466 when the commit method 466 is invoked on the server, the commit method 466 will cause the bitfile and metadata to be transferred from the temporary location into a permanent location, archive counts will be updated transaction will be removed from list of transactions in progress and document will be queued for indexing.
- the fifth method is the rollback method 468 .
- the transaction manager on the archiving portal is responsible for issuing a rollback in case an exception is encountered during archiving process.
- the archival transaction of the electronic document can be rolled back with the rollback method 468 at any point prior to execution of the commit method 466 .
- the rollback method 468 removes the streamed electronic document and metadata from the remote host's temporary location and from the list of active transactions.
- An advantage of implementing the archiving transactions is the ability to handle errors safely and transparently. If an error occurs at any point during archiving process, then the archiving portal can issue a rollback command 468 that will cause the server to remove the files from the temporary location and delete the transaction from the list of active transactions.
- XAArchiveRecovery object runs on the archival units. This object is implemented as a singleton. It validates that no transaction is in progress for a very long time. Transaction timeout is configurable with default value of two hours to accommodate for archiving of extremely large files on a stressed network. If XAArchiveRecovery finds a transaction with a status that was not updated for more than two hours, it examines the status of the transaction on all archiving units that participated in the transaction. If any of the transaction participants is not accessible, it will report an error and wait another two hours to check again.
- Each transaction manager is coded to implement the distributed transaction for archiving of the electronic documents built on top of the remote method execution framework that allows data attachments for method executions ensuring atomic data transfers.
- the set of cooperating transaction managers allow a mirror copy of the electronic document without the need for expensive mirroring technologies.
- FIG. 5 illustrates a Client class diagram of an embodiment of the transaction manager.
- the transaction manager implemented in the object called XAArchiver is coded to give the ability to archive documents to multiple geographically distributed locations atomically.
- the transaction manager 500 is coded to utilize a remote command execution framework based on use of one or more non-blocking sockets of a server.
- Remote Command Execution framework implements a protocol to attach an arbitrary amount of data to the invocation of the remote method with no/minimal memory overhead (Command Attachments).
- the set of cooperating transaction managers implement a protocol to process a large number of transactions simultaneously since one transaction need not be complete before another transaction manager launches another electronic document archiving process.
- the archival system gives the ability to archive more data than any other competitor archive by use of multiple archival servers in combination with the multi-dimensional permanent grid cell storage technique.
- the multi-dimensional permanent split-cell architecture overcomes shortcomings in Storage Area Networks (SANS) and other network storage devices.
- SANS Storage Area Networks
- the customers' data is stored in two geographically-dispersed, ultra-secure, SAS 70-compliant data centers, ensuring extra protection and Continuity of Business in the event of a natural or man-made disaster.
- FIG. 2 illustrates a physical organization of the archive unit including the permanent data storage location.
- the archive unit is organized as a tree where the top level represents an RT directory. This is a directory created based on the time the electronic document in a form of bitfile and metadata is received in the archive unit.
- One archive unit has many RT directories. Each RT directory has subdirectories called level zero directories. There can be up to one hundred “level zero” directories under one RT. Each “level zero” directory contains up to one hundred “level one” directories. Each “level one” directory contains up to one hundred “level two” directories. Each “level two” directory contains approximately fifty electronic documents in a form of “bitfile+metadata”.
- the permanent storage location is a file system implementing the above tree physical organization.
- the File system is both layered to organize and access stored data and then uses a data storage device, such as a hard disk or CD-ROM to maintain the physical location of the data files. Data in the permanent storage location remains even if power is removed and cannot be written over.
- the temporary storage location is buffer space or storage locations in the file system that can have their contents written over.
- the computing system environment where a server hosts the archiving transaction manager is only one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of the invention.
- the transaction manager is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that may be suitable for use with the invention include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
- the transaction manager may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer.
- program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
- Those skilled in the art can implement the description and/or figures herein as computer-executable instructions, which can be embodied on any form of computer readable media discussed below.
- the program modules may be implemented as software instructions, Logic blocks of electronic hardware, and a combination of both.
- the software portion may be stored on a machine-readable medium and written in any number of programming languages such as Java, C++, C, etc. Therefore, the component parts, such as the transaction manager, etc. may be fabricated exclusively of hardware logic, hardware logic interacting with software, or solely software.
- a memory to store the transaction manager may include read only memory (ROM); a hard drive, a CD ROM, random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices; Digital VideoDisc (DVD's), EPROMs, EEPROMs, FLASH memory, magnetic or optical cards, or any type of media suitable for storing electronic instructions.
- ROM read only memory
- RAM random access memory
- magnetic disk storage media magnetic disk storage media
- optical storage media flash memory devices
- DVD's Digital VideoDisc
- EPROMs EPROMs
- EEPROMs Electrically erasable programmable read-only memory
- FLASH memory magnetic or optical cards, or any type of media suitable for storing electronic instructions.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (20)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/536,823 US8195613B2 (en) | 2009-08-06 | 2009-08-06 | Transactional archiving of an electronic document |
GB1010369.5A GB2472484B (en) | 2009-08-06 | 2010-06-21 | Improvements for transactional archiving of an electronic document |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/536,823 US8195613B2 (en) | 2009-08-06 | 2009-08-06 | Transactional archiving of an electronic document |
Publications (2)
Publication Number | Publication Date |
---|---|
US20110035356A1 US20110035356A1 (en) | 2011-02-10 |
US8195613B2 true US8195613B2 (en) | 2012-06-05 |
Family
ID=42582720
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/536,823 Expired - Fee Related US8195613B2 (en) | 2009-08-06 | 2009-08-06 | Transactional archiving of an electronic document |
Country Status (2)
Country | Link |
---|---|
US (1) | US8195613B2 (en) |
GB (1) | GB2472484B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11113262B2 (en) * | 2019-04-01 | 2021-09-07 | Sap Se | Time-efficient lock release in database systems |
US11863615B2 (en) | 2022-03-18 | 2024-01-02 | T-Mobile Usa, Inc. | Content management systems providing zero recovery time objective |
US12250267B2 (en) | 2014-06-24 | 2025-03-11 | Oracle International Corporation | System and method for supporting partitions in a multitenant application server environment |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2472620B (en) * | 2009-08-12 | 2016-05-18 | Cloudtran Inc | Distributed transaction processing |
US20110289046A1 (en) * | 2009-10-01 | 2011-11-24 | Leach R Wey | Systems and Methods for Archiving Business Objects |
US9710344B1 (en) | 2010-12-13 | 2017-07-18 | Amazon Technologies, Inc. | Locality based quorum eligibility |
US8473775B1 (en) | 2010-12-14 | 2013-06-25 | Amazon Technologies, Inc. | Locality based quorums |
US8892845B2 (en) * | 2010-12-22 | 2014-11-18 | Cleversafe, Inc. | Segmenting data for storage in a dispersed storage network |
WO2015065450A1 (en) * | 2013-10-31 | 2015-05-07 | Hewlett-Packard Development Company, L.P. | Non-blocking registration in distributed transactions |
US9961011B2 (en) | 2014-01-21 | 2018-05-01 | Oracle International Corporation | System and method for supporting multi-tenancy in an application server, cloud, or other environment |
US11188427B2 (en) * | 2014-09-26 | 2021-11-30 | Oracle International Corporation | System and method for transaction recovery in a multitenant application server environment |
US20160162991A1 (en) * | 2014-12-04 | 2016-06-09 | Hartford Fire Insurance Company | System for accessing and certifying data in a client server environment |
EP3166028B1 (en) * | 2015-11-06 | 2020-03-25 | Open Text SA ULC | Archive center for content management |
US10846139B2 (en) * | 2018-11-15 | 2020-11-24 | Bank Of America Corporation | Self-purgative electronic resources |
US11386080B2 (en) * | 2019-08-23 | 2022-07-12 | Capital One Services, Llc | Transaction processing failover |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6192365B1 (en) * | 1995-07-20 | 2001-02-20 | Novell, Inc. | Transaction log management in a disconnectable computer and network |
US6965904B2 (en) | 2001-03-02 | 2005-11-15 | Zantaz, Inc. | Query Service for electronic documents archived in a multi-dimensional storage space |
US20050278274A1 (en) * | 2004-05-25 | 2005-12-15 | Kovachka-Dimitrova Monika M | Transaction model for deployment operations |
US20060080316A1 (en) * | 2004-10-08 | 2006-04-13 | Meridio Ltd | Multiple indexing of an electronic document to selectively permit access to the content and metadata thereof |
US20060259468A1 (en) * | 2005-05-10 | 2006-11-16 | Michael Brooks | Methods for electronic records management |
US7272594B1 (en) | 2001-05-31 | 2007-09-18 | Autonomy Corporation Ltd. | Method and apparatus to link to a related document |
US20080077423A1 (en) | 2006-06-30 | 2008-03-27 | Gilmore Alan R | Systems, methods, and media for providing rights protected electronic records |
US20080250074A1 (en) * | 2007-04-04 | 2008-10-09 | Oracle International Corporation | Recoverable last resource commit |
US7900085B2 (en) * | 2008-05-29 | 2011-03-01 | Red Hat, Inc. | Backup coordinator for distributed transactions |
-
2009
- 2009-08-06 US US12/536,823 patent/US8195613B2/en not_active Expired - Fee Related
-
2010
- 2010-06-21 GB GB1010369.5A patent/GB2472484B/en not_active Expired - Fee Related
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6192365B1 (en) * | 1995-07-20 | 2001-02-20 | Novell, Inc. | Transaction log management in a disconnectable computer and network |
US6965904B2 (en) | 2001-03-02 | 2005-11-15 | Zantaz, Inc. | Query Service for electronic documents archived in a multi-dimensional storage space |
US7272594B1 (en) | 2001-05-31 | 2007-09-18 | Autonomy Corporation Ltd. | Method and apparatus to link to a related document |
US20050278274A1 (en) * | 2004-05-25 | 2005-12-15 | Kovachka-Dimitrova Monika M | Transaction model for deployment operations |
US20060080316A1 (en) * | 2004-10-08 | 2006-04-13 | Meridio Ltd | Multiple indexing of an electronic document to selectively permit access to the content and metadata thereof |
US20060259468A1 (en) * | 2005-05-10 | 2006-11-16 | Michael Brooks | Methods for electronic records management |
US20080077423A1 (en) | 2006-06-30 | 2008-03-27 | Gilmore Alan R | Systems, methods, and media for providing rights protected electronic records |
US20080250074A1 (en) * | 2007-04-04 | 2008-10-09 | Oracle International Corporation | Recoverable last resource commit |
US7900085B2 (en) * | 2008-05-29 | 2011-03-01 | Red Hat, Inc. | Backup coordinator for distributed transactions |
Non-Patent Citations (1)
Title |
---|
United Kingdom Intellectual Property Office, Search Report for UK Patent Application GB 1010369.5, dated Oct. 4, 2010, 3 pages. |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12250267B2 (en) | 2014-06-24 | 2025-03-11 | Oracle International Corporation | System and method for supporting partitions in a multitenant application server environment |
US11113262B2 (en) * | 2019-04-01 | 2021-09-07 | Sap Se | Time-efficient lock release in database systems |
US11863615B2 (en) | 2022-03-18 | 2024-01-02 | T-Mobile Usa, Inc. | Content management systems providing zero recovery time objective |
Also Published As
Publication number | Publication date |
---|---|
GB2472484B (en) | 2013-09-18 |
US20110035356A1 (en) | 2011-02-10 |
GB2472484A (en) | 2011-02-09 |
GB201010369D0 (en) | 2010-08-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8195613B2 (en) | Transactional archiving of an electronic document | |
US10956601B2 (en) | Fully managed account level blob data encryption in a distributed storage environment | |
US10860547B2 (en) | Data mobility, accessibility, and consistency in a data storage system | |
US11016696B2 (en) | Redundant distributed data storage system | |
JP5254611B2 (en) | Metadata management for fixed content distributed data storage | |
US11086734B2 (en) | Accelerated recovery after a data disaster | |
CN103838646B (en) | A kind of system and method for Ground Application big data disaster-tolerant backup | |
US9110837B2 (en) | System and method for creating and maintaining secondary server sites | |
JP5260536B2 (en) | Primary cluster fast recovery | |
US11609827B2 (en) | Distributed architecture for tracking content indexing | |
US8055937B2 (en) | High availability and disaster recovery using virtualization | |
US20210034473A1 (en) | Distributed framework for task splitting and task assignments in a content indexing system | |
JP2020514902A (en) | Synchronous replication of datasets and other managed objects to cloud-based storage systems | |
WO2022174735A1 (en) | Data processing method and apparatus based on distributed storage, device, and medium | |
US11269927B2 (en) | Transactional replicator | |
JP2017531250A (en) | Granular / semi-synchronous architecture | |
US8930751B2 (en) | Initializing replication in a virtual machine | |
AU2019380380B2 (en) | Taking snapshots of blockchain data | |
US20220318199A1 (en) | Seamless migration of stubs and data between different filer types | |
CN116389233A (en) | Container cloud management platform active-standby switching system, method and device and computer equipment | |
US10216746B1 (en) | Managing file system access to remote snapshots | |
CN109995808A (en) | A kind of enterprise data storage system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: AUTONOMY CORPORATION LTD., UNITED KINGDOM Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:VUKOJEVIC, BOJAN;REEL/FRAME:023062/0833 Effective date: 20090804 |
|
ZAAA | Notice of allowance and fees due |
Free format text: ORIGINAL CODE: NOA |
|
ZAAB | Notice of allowance mailed |
Free format text: ORIGINAL CODE: MN/=. |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: LONGSAND LIMITED, UNITED KINGDOM Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AUTONOMY CORPORATION LIMITED;REEL/FRAME:030009/0469 Effective date: 20110928 |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20240605 |