US20120173596A1 - Relational objects for the optimized management of fixed-content storage systems - Google Patents
Relational objects for the optimized management of fixed-content storage systems Download PDFInfo
- Publication number
- US20120173596A1 US20120173596A1 US13/421,042 US201213421042A US2012173596A1 US 20120173596 A1 US20120173596 A1 US 20120173596A1 US 201213421042 A US201213421042 A US 201213421042A US 2012173596 A1 US2012173596 A1 US 2012173596A1
- Authority
- US
- United States
- Prior art keywords
- data
- fixed
- storage system
- content storage
- differenced
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 claims abstract description 42
- 230000014759 maintenance of location Effects 0.000 claims description 9
- 230000010076 replication Effects 0.000 claims description 7
- 238000004891 communication Methods 0.000 claims description 5
- 230000008859 change Effects 0.000 abstract description 4
- 238000007726 management method Methods 0.000 description 26
- 239000002131 composite material Substances 0.000 description 17
- 238000007667 floating Methods 0.000 description 16
- 230000008569 process Effects 0.000 description 13
- 238000000354 decomposition reaction Methods 0.000 description 6
- 230000001105 regulatory effect Effects 0.000 description 6
- 238000012550 audit Methods 0.000 description 5
- 238000007596 consolidation process Methods 0.000 description 5
- 238000013523 data management Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 238000013508 migration Methods 0.000 description 4
- 230000005012 migration Effects 0.000 description 4
- 238000005192 partition Methods 0.000 description 4
- 238000012546 transfer Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000003116 impacting effect Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000000638 solvent extraction Methods 0.000 description 3
- 230000004075 alteration Effects 0.000 description 2
- 230000033228 biological regulation Effects 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 229920006253 high performance fiber Polymers 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000012502 risk assessment Methods 0.000 description 1
- 238000009987 spinning Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
- 238000012384 transportation and delivery Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
- G06F16/164—File meta data generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/11—File system administration, e.g. details of archiving or snapshots
Definitions
- the present invention relates to fixed-content storage systems.
- the present invention relates to managing data objects in a fixed-content storage system.
- a fixed-content object is a container of digital information that, once created, remains fixed. Examples of objects that could be fixed include medical images, PDF documents, photographs, document images, static documents, financial records, e-mail, audio, and video. Altering a fixed-content object results in the creation of a new fixed-content object. A fixed-content object once stored becomes immutable.
- Fixed-content digital data is often subject to regulatory requirements for availability, confidentiality, integrity, and retention over a period of many years. As such, fixed-content data stores grow without bounds and storage of these digital assets over long periods of time presents significant logistical and economic challenges.
- ILM Information Lifecycle Management
- a multi-site, multi-tier storage system, large scale distributed fixed-content storage is needed, for example, to address the requirement for storing multiple billions of fixed-content data objects.
- These systems ensure the integrity, availability, and authenticity of stored objects while ensuring the enforcement of Information Lifecycle Management and regulatory policies. Examples of regulatory policies include retention times and version control.
- Fixed-content storage systems grow as new objects are stored. This growth is accelerated by providing redundant copies of fixed-content objects in order to reduce the probability of data loss. As the size and complexity of the fixed-content storage system grow, the resources necessary to manage the storage system also increase. Improved data management techniques are therefore needed as the system scales to more efficiently store, organize, and manage data in a fixed-content storage system, while also fulfilling applicable regulations.
- a data object to be stored in a distributed fixed-content storage system is intelligently decomposed along the data object's logical boundaries. Intelligently decomposed objects are compared with other reference objects and, where they are identical, one reference object is stored and referenced by a reference content block.
- a medical study archive contains thousands of instances of a template form with minor variations. For each instance, the template is stored separately from the additional data. Intelligent decomposition of the template data and the additional data when storing the archive allows for one instance of the template data to be referenced by other objects containing reference content blocks.
- storage resources may be used efficiently where identical data is stored in only as many places as required by regulatory or other requirements.
- multiple external data objects are consolidated into a single data object.
- the external data objects are accessed by reference to metadata that indicates an offset and size of the external data object.
- differenced objects are created when an object stored in a fixed-content storage system is edited.
- the edits to the original object may represent a small change in the original object, but because the stored original object is immutable it is not possible to simply overwrite the small portion that is edited.
- a new object is created that references both the original object and the edited data.
- the metadata of the new object includes information relating to the offset and the size of the edited data so that the edited data is accessed instead of the corresponding portion of the original object.
- composite objects are provided that reference multiple objects.
- a manifest data object is created that references each object, and accessing the manifest data object allows for the identification, access, and management of objects joined in the composite object.
- FIG. 1 illustrates various nodes in a distributed storage system.
- FIG. 2 illustrates an embodiment of a fixed-content storage subsystem that comprises multiple data objects.
- FIGS. 2A-E illustrate a method of intelligent decomposition and storage of content.
- FIGS. 3A-C illustrate a method of object consolidation and storage of content.
- FIGS. 4A-C illustrate a method of storing content as a differenced object.
- FIGS. 5A-C illustrate a method of storing content as a composite object.
- FIG. 6 illustrates a composite object utilizing various storage methods.
- Fixed-content storage involves the storage and management of data such that once stored, the data is immutable—it cannot be changed. Thus, locks are not required for alterations to the contents of the object.
- additional objects may be stored that consist of minor variations of an existing object and many objects may have large amounts of identical data. Efficiency is provided according to certain embodiments by recognizing where these minor variations and duplicate data exist. Rather than providing more copies of any particular data than necessary, metadata is configured to provide references to data objects containing the data. Additionally, object management may be simplified by reducing the total number of objects or providing a single object that allows access to and management of additional objects.
- a typical fixed-content storage system deployment may involve multiple nodes, often spanning multiple geographically separated sites.
- the storage grid 200 may serve that request based on the location of the data, the location of the user, the load on the system, and the state of the network. This balances the load on the network, storage and servers in order to minimize bandwidth usage and increase performance.
- the storage grid 200 is a unified structure, but there may be multiple servers or repositories of content or metadata.
- Nodes may be grouped based on the services they provide. For example, storage nodes 232 , 236 may provide for secure data storage and transmission.
- a storage node may consist of a service running on a computing resource that manages storage and archival media such as a spinning media resource or tape.
- the storage resource 224 , 242 on a storage node can be based on any storage technology, such as RAID, NAS, SAN, or JBOD. Furthermore, this resource may be based on any grade of disk such as a high performance fiber channel or ATA disk. Storage nodes may be linked together over, for example, LAN and WAN network links of differing bandwidth.
- Storage nodes can accept data and process retrieval requests, and information input into a storage node can be retrieved from other storage nodes.
- Storage nodes may process client protocol requests and include support for DICOM, HTTP and RTP/RTSP.
- Support for NFS/CIFS may be provided, for example, through gateway nodes.
- Storage nodes may replicate and cache data across multiple sites and multiple nodes.
- Data replication is based on a set of configurable rules that are applied to the object metadata and may take into account geographic separation of nodes as well as the bandwidth between nodes.
- the logic that governs replication and distribution may be enforced by control nodes.
- Gateway nodes 228 provide an interface through which external applications 220 may communicate with the storage grid. Gateway nodes 228 route incoming requests to storage nodes based on, for example, the available CPU, bandwidth, storage and geographic proximately. For applications that require direct file system access, the gateway nodes 228 may provide a NFS/CIFS interface to the storage grid.
- Control nodes 238 may consist of separate software services, such as the Content Metadata Service (CMS) and the Administrative Domain Controller (ADC). Although these services can run on separate computing resources, they may also share a single server.
- CMS Content Metadata Service
- ADC Administrative Domain Controller
- the Content Metadata Service constitutes a distributed business rules engine that provides for content metadata storage, metadata synchronization, metadata query and enforcement of replication and information lifecycle management business logic. Replication and information lifecycle management policies may be based on metadata that is associated with stored objects. This allows the creation of business rules that determine where content is stored, how many copies are stored, and on what media it is stored on throughout its lifecycle.
- a Content Metadata Service may interface, for example, with a local SQL database through a database abstraction layer.
- the Administrative Domain Controller acts as a trusted authentication repository for node-to-node communication. It also provides knowledge of system topology and information to optimize real-time usage of bandwidth, CPU and storage resources. This allows automated management of computational resources and dynamic load balancing of requests based on the available CPU, storage and bandwidth resources.
- the Administration Node 234 may consist of software components such as the Network Management Service and the Audit Service. These services may share a common computing resource, or they may be run on separate computing resources.
- a management interface 226 may be used to monitor and manage the operational status of the grid and associated services.
- the Audit Service provides for the secure and reliable delivery and storage of audited events corresponding to content transactions across the entire storage grid. Audit events are generated, in real-time, by Storage Nodes and Control Nodes. Events are then relayed through the storage grid using a reliable transport mechanism and delivered to the Administration Nodes. Audit messages are processed by the Audit Service and may be directed to an external database or file.
- the Network Management Service collects and processes real-time metrics on utilization of computing, storage and bandwidth resources. It provides real-time and historical usage reports. In addition it is responsible for fault reporting and configuration management.
- the Archive Node 230 , 240 may manage a locally attached tape drive or library 246 for the archiving and retrieval of grid managed objects. Archive nodes may be added to diversify archive pools and to provide archival storage at multiple sites.
- the storage grid 200 may also utilize external storage resources, such as a managed tape library 222 or an enterprise SAN 224 .
- Storage Nodes and Control Nodes in the storage grid can be upgraded, decommissioned, replaced or temporarily disconnected without any disruption. Nodes do not need to run on the same hardware or have the same storage capacity. Nodes replicate and cache data across multiple sites and multiple nodes. In addition to bandwidth savings, the intelligent distribution of information provides for real-time backup, automated disaster recovery and increased reliability.
- Capacity, performance and geographic footprint of the storage grid can be increased by adding nodes as needed, when needed, without impacting end-users. This enables the storage grid to accommodate thousands of terabytes of data across hundreds of locations.
- the storage grid combines the power of multiple computers to achieve extremely high levels of scalability and throughput. As nodes are added to the storage grid, they contribute to the available computational and storage resources. These resources are seamlessly utilized based on bandwidth availability and geographical suitability.
- An object can be one file or a collection of files with relationships that are defined by object metadata.
- Object metadata constitutes application specific information that is associated with a data object. This information can be attached to or extracted from the object at the time of input into the storage grid. Object metadata can be queried and the storage grid can enforce business rules based on this information. This allows for efficient utilization of storage/bandwidth resources, and enforcement of storage management policies.
- the storage grid is fault tolerant, resilient and self-healing. Transactions continue to be processed even after multiple hardware, storage and network failures.
- the design philosophy is that hardware, network, and catastrophic failures will occur, and the system should be able to deal with faults in an automated manner without impacting the stored data or end-users.
- Reliability is achieved through replicas, which are identical copies of objects (both data and metadata) that are stored on multiple nodes and kept synchronized.
- Increasing reliability involves adding nodes to the storage grid and increasing the number of replicas for each object.
- the location and number of the replicas is based on a set of rules that can be configured to ensure geographical separation and the desired level of redundancy.
- the storage grid will automatically enforce this logic across all nodes. If a failure is detected, the system is self-healing in that additional replicas are automatically created to restore the level of resiliency.
- the system manages the available storage. Incoming data is transparently re-directed to the take advantage of the newly added storage capacity.
- objects are redistributed, purged, or replicated based on metadata and policies that are applied to the metadata.
- Objects can also migrate from one storage grade (e.g., disk) to another (e.g., tape) not simply based on time and date stamps, but external metadata that indicates the importance of the object to the specific business application. For example in medical applications, certain imaging exams may be immediately committed to deep storage. In applications for the financial sector, retention policies may be set up to facilitate compliance with regulatory requirements for data retention.
- Users may input and retrieve data from the location within the storage grid that is closest to them, thereby efficiently utilizing bandwidth and reducing latency.
- it may be cached at the requesting Storage Node to enable improved bandwidth efficiency.
- a Storage Node may be decommissioned through the administrative console. When this takes place, the storage grid may automatically redirect requests to alternate nodes. Furthermore, the storage grid may transparently re-distribute the stored data on other suitable Storage Nodes. This allows for seamless removal of obsolete hardware without any disruptions to storage grid operations. This is in contrast to disruptive data migration procedures that are common in many fixed content applications. Operators can eliminate support for obsolete hardware while taking advantage of the economic benefits of decreasing costs of storage and increases in processing power. Each newly added node costs less and provides more processing power and storage capacity.
- Objects consist of data and associated metadata that are managed as an unalterable and atomic entity. Once stored, these objects are actively managed throughout their information lifecycle. When an object is retrieved, the original data and associated metadata is presented for use. This provides a transparent storage service to external entities.
- Each object stored may have a unique identifier that acts as the primary identifier for the object. This identifier may be assigned at the time the object is created. Objects can be moved from one object store to another.
- Objects stored within the grid may contain metadata, which is used to manage the objects over their lifecycle and facilitate access to the objects.
- Object metadata may include, for example, Content Block metadata, Protocol metadata, Content metadata, User metadata, or Management metadata.
- Content Block metadata may be metadata associated with the object creation process itself, and provides information about the packaging and protection of the user provided data and metadata.
- An example of this type of metadata is the size of the data stored in a given object.
- Protocol metadata may be metadata associated with the protocol used to store the object, but not intrinsic to the data within the object. This includes metadata required to perform protocol specific transactions.
- metadata For data stored through the DICOM protocol, an example of this type of metadata is the DICOM AE title of the entity that stored the data.
- Content metadata may include metadata contained within recognized types of content. If so processed, metadata specific to each recognized type of content is extracted from the content. For content of type PDF, an example of this type of metadata is the number of pages in a document.
- User metadata may include arbitrary metadata specified by the entity storing content into the grid. This ability to attach user metadata is limited by the protocol used to store the objects.
- An example of this type of metadata is a private identifier assigned by the user.
- Management metadata consists of metadata generated and modified over time as objects are managed within the grid. Unlike the previous four classes of metadata, this metadata is not immutable, and is not present as part of the object itself.
- An example of this type of metadata is the time when an object was last accessed.
- the metadata associated with the object is also stored in a separate subsystem that maintains a repository of metadata.
- the metadata store can be queried to return the metadata associated with a given object. Queries can also be performed to return a list of objects and requested metadata for all objects that have metadata that matches a specific query.
- Placement of objects may be based on the capabilities of the storage grid computing resources. Different computing resources have different capacity to perform work. While this is primarily measured based on the clock frequency of the processor, the number of processors and relative efficiencies of different processor families may also be taken into account. In addition, the amount of CPU resources that are currently in use provides a mechanism to determine how “busy” a given resource is. These characteristics are monitored and measured to allow decisions to be made within the grid about which computing resource is best suited to use to perform a given task.
- Placement of objects may also be based on the characteristics of the storage resources, such as storage latency, reliability, and cost.
- Storage capacity provides information for calculating risk in the event of rebuild.
- a measurement of the amount of storage capacity that is currently in use provides a mechanism to determine how full a given storage resource is, and determine which locations are more able to handle the storage or migration of new content. Different storage resources have different throughput. For example, high performance Fiber-Channel RAID systems will deliver better performance then a lower performance software RAID on IDE drives.
- a measurement of the amount of I/O bandwidth that is currently in use provides a mechanism to determine the extent to which a given storage resource is able to handle additional transactions, and how much it will slow down current transactions.
- Storage resources can be read-only, and thus not a candidate for the storage of new objects. These characteristics may be monitored and measured to allow decisions to be made within the grid about which storage resource is best suited to use to retain objects over time, and influence the rules that determine where objects should be stored.
- Placement of objects may also consider the characteristics of network paths, such as latency, reliability and cost. Different network paths have different amounts of bandwidth available. This directly maps into the time required to transfer objects from one storage repository to another. The amount of the network bandwidth that is currently in use may also be considered. This provides a mechanism to determine how “busy” a given network link is, and to compare the expected performance as compared to the theoretical performance. These characteristics may be monitored and measured to allow decisions to be made within the grid about which network path is best suited to use to transfer objects through the grid.
- the probability of data loss is reduced.
- the probability of data loss and data inaccessibility for a given placement of objects can be quantified and reduced to manageable levels based on the value of the data in question.
- replicas of objects can be placed in separate failure zones. For example, two replicas created within a single server room can take into account that storage on nodes that do not share a single UPS has a higher probability of accessibility then two replicas stored on two nodes that share the same UPS. On a larger scale, two replicas created in geographically distant locations have a lower probability of loss then two nodes within the same facility.
- replica placement rules are metadata driven, they can be influenced by external systems and can change over time. Changes to existing replicas and changes to the topology of the grid can also influence replica placement rules.
- Replica placement can reflect the instantaneous, historical and predictive information associated with a given resource. For example, monitoring of server and storage health can dynamically influence the degree of reliability attributed to a given resource. Different types of storage resources, such as IDE vs. SCSI, have different reliability characteristics. In addition, archival and offline storage often have a distinct media lifetime, which need to be managed to preserve archive integrity. These are both examples of the use of information about available resources is used to determine the best solution for a given set of constraints.
- Implementation of configuration information based on formal risk analysis can further optimize the resource tradeoff by providing information about common mode failures that cannot be automatically discovered by the grid. For example, the placement of two replicas on nodes situated along the same fault line may be considered to be within a common failure mode, and thus suboptimal when compared to the placement of one of the replica in a facility not located on the fault.
- a live feed from the weather monitoring system can provide advance notice of extreme weather events, which could allow the grid to dynamically rebalance content to reduce the risks associated with the loss of connectivity to a given facility.
- Content stored in a fixed-content storage system can be, but is not limited to, audio, video, data, graphics, text and multimedia information.
- the content is preferably transmitted via a distribution system which can be a communications network including, but not limited to, direct network connections, server-based environments, telephone networks, the Internet, intranets, local area networks (LAN), wide area networks (WAN), the WWW or other webs, transfers of content via storage devices, coaxial cable, power distribution lines (e.g., either residential or commercial power lines), fiber optics, among other paths (e.g., physical paths and wireless paths).
- content can be sent via satellite or other wireless path, as well as wireline communications networks, or on the same path as a unit of power provided by a utility company.
- novel data structures are utilized in order to allow certain features described herein.
- Objects stored within the storage system are stored as one or more packets.
- Each packet includes a certain non-zero amount of packet metadata and zero or more bytes of payload data.
- the quantity of packet metadata and the quantity of payload data vary among different packets.
- a maximum packet size or quantity of payload data may be utilized.
- the maximum quantity of payload data in a variable size packet may be configured to be 16 KB.
- Each packet may include a predetermined identical amount of packet metadata and payload data in some embodiments.
- the packet metadata may contain information allowing for the processing of variable sized packets when the amount of packet metadata and payload data is not predefined.
- Types of packet metadata include offset data, packet size data, and the like. This packet metadata may allow for the arbitrary retrieval of data in an object by identifying a specific packet or bytes within or across one or more packets.
- FIG. 2 shows an embodiment of a fixed-content storage subsystem 700 that comprises multiple data objects.
- the data objects comprise metadata 701 and payload data 702 .
- the fixed-content storage system 700 is accessible by a remote server 720 .
- one or more packets may comprise reference content blocks 710 and/or floating reference content blocks 705 according to some embodiments.
- a reference content block 710 preferably has only packet metadata that refers to a different packet or content block, and does not contain any payload data.
- the packet metadata reference may cause an application accessing the reference content block to access some other packet(s) in place of the reference content block.
- a reference content block may be stored rather than another short video (such as a geographically specific clip).
- the reference content block may refer to that short clip stored separately, either in the fixed-content system or in another storage system.
- a floating reference content block 705 is a reference content block that does not yet point to a packet or reference content block. Unlike reference content blocks 710 , which are resolved at the storage system 700 (for example, by referring to a logical or physical memory address, or by referring to a particular object or instance), floating reference content blocks 705 are resolved at a server 720 or computing system outside the fixed-content storage system when the data is accessed.
- the packet metadata associated with the floating reference content block 705 specifies the size, duration, and/or other information that enables the server 720 to resolve the floating reference content block 705 . Accordingly, an object comprising one or more packets may reference other objects or portions of other objects within the storage system 700 . According to some embodiments and as shown in FIG. 2 , a server 720 resolving a floating reference content block 705 may also resolve the storage location to an external storage system 730 .
- floating reference content blocks an object may reference variable data within the storage system. Though the data written to the fixed-content storage system 700 is not altered, floating reference content blocks 705 allow for the modification of an object as seen by an external user accessing the storage system 700 . Floating reference content blocks may therefore be a powerful tool when used with a fixed-content storage system as described herein.
- a medical report/form template is stored in a fixed-content storage system, there may be a number of blank fields. For each patient having a report stored, the values of these fields may be different, but the template is largely the same. If these fields are stored as floating reference content blocks, then the patient data may be stored separately for each patient, without duplicating the template data. When the data is accessed, for example by a medical professional, they may request information on one of the patients. The template would be loaded, and based on the patient information requested, the medical professional's computing system can resolve the floating reference content blocks in order to access the specific patient data requested along with the report form.
- Floating reference content blocks may be resolved according to any criteria appropriate to the particular file. For example, a floating reference content block may be resolved based on the geographic location of the computing system accessing the data, an IP address, data submitted by the computing system, or the like.
- the metadata in a reference content block or a floating reference content block can override some of the metadata in a packet (or group of packets) that is pointed to. This may allow certain data stored in the fixed-content storage system to be treated differently according to how it is accessed. This in turn may allow for objects to be stored once rather than requiring near identical copies, as the data is immutable. By changing the management rules of the fixed-content storage system, more flexibility is obtained without modifying the protected data.
- FIGS. 2A-E demonstrate a method for intelligently decomposing data stored in a fixed-content storage system according to one embodiment.
- Intelligent decomposition stores data objects according to their logical boundaries and allows for single instance storage of objects or portions of objects that may be identical. For example, in some systems multiple instances of similar data are stored, where the difference is the payload within a well-known structure, such as a TAR archive.
- a TAR archive is the concatenation of one or more files.
- FIG. 2A shows one embodiment of an implementation of intelligent decomposition data management techniques with reference to a TAR archive 10 for a medical system storing, for example, cardiology and radiology images.
- the TAR archive includes two archived files 12 , 14 .
- Each archived file 12 , 14 is preceded by a header block 16 , 18 .
- the archived file data is written unaltered except that its length is rounded up to a multiple of 512 bytes and the extra space is zero filled.
- the TAR headers 16 , 18 may comprise 512 byte blocks of data indicating the size of each data file, the owner and group ID, the last modification time, and other data.
- FIG. 2B illustrates partitioning of the TAR archive 10 into five packets 20 , 22 , 24 , 26 , 28 .
- the partitioning of the packets 20 , 22 , 24 , 26 , 28 was done without regard for the file boundaries within the TAR archive.
- the packets 20 , 22 , 24 , 26 , 28 contain data from various sources that may not be logically related.
- the packet 24 contains data corresponding to file 12 , header block 18 , and file 14 . There is no alignment of the TAR headers, and no references to data in external objects.
- FIG. 2C illustrates the partitioning of the TAR archive 10 by using the file boundaries and the alignment of TAR headers.
- TAR header 16 is placed in packet 30
- archived file 12 is placed in packets 32 , 34
- TAR header 18 is placed in packet 36
- archived file 14 is placed in packets 38 , 40 . Because the TAR archive 10 was partitioned along the TAR archive header and file boundaries, each of the TAR archive headers and files can be handled separately.
- FIG. 2D illustrates an exemplary embodiment for storing the partitions from FIG. 2C as multiple objects.
- a master object 42 corresponds to the TAR archive 10 .
- the master object 42 includes a component for each of the two files in the TAR archive.
- the first component includes metadata packet 25 A, TAR file header packet 30 (from FIG. 2C ), and reference block 27 A.
- the second component includes metadata packet 25 B, TAR file header packet 36 (from FIG. 2C ), and reference block 27 B.
- Reference block 27 A provides a reference to a reference object 46 .
- Reference object 46 includes partitions 32 , 34 corresponding to the first file 12 in the TAR archive 10 , and packet metadata 25 D and 25 E.
- Reference block 27 B provides a reference to a reference object 48 .
- Reference object 48 includes partitions 38 , 40 corresponding to the second file 14 in the TAR archive 10 , and packet metadata 25 F and 25 G.
- each archived file 12 , 14 is stored as a unique object and referenced by a master object.
- FIG. 2D also includes a second master object 44 .
- Master object 44 includes a packet 31 corresponding to a third header.
- the third header is found in a TAR archive that also contains the first data file 12 .
- the reference content block 27 C references the existing stored reference object 46 .
- a media file may contain a series of media clips, and each media clip could be treated as an object.
- a pdf file may contain pages or other content that could be treated as separate objects.
- FIG. 2E One embodiment of a process for intelligently decomposing objects stored to a fixed-content storage system is shown in FIG. 2E .
- the process begins at state 201 where an object to be stored is received.
- the object received is preferably of a type having a well known file structure so that it can be decomposed or packetized at state 202 along its logical boundaries. For example, header data may be separated from payload data.
- the decomposed object is thus broken into separate portions, each of which may comprise one or more packets.
- One of the portions is selected at state 203 , and at decision state 204 it is determined if the selected portion is identical to an existing stored reference object.
- the existing object may comprise any other object, but is likely to be a reference object related to the current object being stored. For example, if the current object being stored is an instance of a medical study, then existing instances of the study may be identified based on metadata or additional data from the external system providing the object. If the portion already exists as a reference object, then the existing object is referenced by a reference content block at state 205 . If the portion does not already exist in the storage system, then the decomposed object portion is stored at state 206 .
- decision state 207 it is determined whether the entire received object has been stored or referenced. If any portion remains, then the process returns to state 203 . When all portions have been handled, then a master object exists in the storage system for the received object that references existing data as well as any new data. Thus, this process may advantageously be used in a fixed-content storage system in order to allow greater flexibility and reduce the need for increased storage space.
- the decomposed object portion is stored prior to identification of existing instances of the object. After it is determined that equivalent content to the decomposed object portion is stored in another object, the identifier for the decomposed object portion may be repointed to the other object. The stored decomposed object portion may then be removed.
- FIGS. 3A-C show a method of object consolidation for a fixed-content storage system.
- FIGS. 3A-C show a method of object consolidation for a fixed-content storage system.
- a data object representing an advertisement is created for distribution and display in a variety of geographical areas.
- the advertisement data object may be configured to reference a large number of additional data objects (e.g., endings), with each of the additional data objects corresponding to one of the geographical areas.
- additional data objects e.g., endings
- a single object may be created with each of the additional data objects stored back-to-back.
- a floating reference content block resolves to a different offset based on the geographic location.
- the endings are stored back-to-back so that a single object is created including the advertisement and all of the endings.
- the cost of managing many small objects for different applications, sometimes having tens of thousands or more individual instances, can be quite large. Storing the small objects as a single object allows for random access retrieval while reducing the number of objects required, thus making storage management more cost effective.
- a data object representing a medical study may include thousands of individual cases or instances.
- the cost of managing many small objects can be large from a licensing or hardware standpoint. Consolidating the cases or instances reduces the number of objects required. The individual cases or instances would still be accessible using offsets for random-access.
- FIG. 3A shows an example of object consolidation of two external data objects 51 and 52 according to one embodiment.
- the external data objects 51 and 52 may be any type of data object, such as media files, medical storage files, or the like.
- external data object 51 may represent a first file of a medical study to be stored, and external data object 52 may represent an additional instance of the study.
- the external data objects 51 and 52 are files that were originally stored in the same folder.
- Data object 50 comprises metadata 54 , 55 and external data objects 51 and 52 .
- Metadata 54 , 55 may indicate, for example, an offset and size of a particular section of an object. While the example shown in FIGS. 3A and 3B show only two external data objects consolidated to form data object 50 , in some embodiments a different number of external data objects are consolidated. As the number of external objects increases, object consolidation as described herein provides additional efficiency in managing the objects in a fixed-content storage system.
- FIG. 3C shows a process for creating a consolidated data object.
- multiple objects are received or accessed. In some embodiments, these objects are accessed and consolidated from within a storage system. In some embodiments, multiple objects are received from an external computing system to be stored, and every object to be consolidated is received in a single data transfer. In some embodiments, one or more new objects to be consolidated with existing stored data are received.
- metadata is generated for the consolidated object that indicates an offset and size for the received data objects.
- the metadata may indicate that a first data object stored in a consolidated data object may have no offset and be 64 KB, while the second data object may have a 64 KB offset and be 32 KB.
- the multiple received objects are stored back-to-back as a single object. Any reference to the multiple received objects can be handled by the consolidated object that will reference each of the received objects by offset. Accordingly, management of many related objects may be simplified and costs reduced because a smaller number of objects are stored in the storage system.
- Medical data may include an image and corresponding demographic data.
- the size of the image is much larger than the corresponding demographic data.
- a 50 MB image may be updated to write 32 bytes worth of patient name information.
- FIGS. 4A-C show an example of a method for generating and storing a differenced object in a fixed-content storage system to more efficiently handle such changes according to one embodiment.
- FIG. 4A shows an original data object 60 and an edited data object 65 as stored in a traditional fixed-content storage system.
- Original object 60 comprises metadata 71 and payload data 61 A-C.
- the original data object 60 may be a 50 MB radiology image along with a relatively small amount of associated data 61 B that represents patient name, demographic data, and the like.
- the associated data 61 B may represent, for example 32 bytes of 50 MB data object 60 .
- a typical fixed-content system may store the edited object as a new data object 65 that includes most of the data from the original data object 60 , but has replaced the associated data 61 B with the edited data 66 .
- FIG. 4 B shows a method for storing a differenced object including essentially only the changes.
- FIG. 5B shows original object 60 comprising packet metadata 71 and payload data 61 A-C.
- An edit represented by data 66 has again been made to the associated data 61 B representing a small portion of the original object 60 .
- a differenced object 70 is created as the edited object.
- Differenced object 70 comprises reference content block 72 A.
- Reference content block 72 A references the original object 60 so that the data shared by the edited object 65 and the original object 60 may be accessed by differenced object 70 without storing additional copies of the data.
- Reference content block 72 A further references an object including metadata 71 , edited data 66 , and reference content block 72 B.
- the reference content block 72 A and the reference content block 72 B may indicate the location or offset where associated data 61 B of the original object 60 is to be replaced by edited data 66 when the edited and differenced object 70 is accessed, the size of the edited data 66 , the size of the associated data 61 B, and the like. Referencing the identical data from the original object 60 allows original object 60 to be maintained as a fixed-content object, while small changes are efficiently stored to create additional instances of edited objects.
- FIG. 4C is a flowchart indicating one embodiment of a process for generating a differenced object.
- an edited object is received.
- the edited object is compared to the original object.
- associated data 61 B is shown as the payload data from one packet.
- edits may comprise only a portion of the payload data from a packet or may comprise multiple packets or portions thereof.
- edited data 66 is shown in FIGS. 4A and 4B as containing the same quantity of data as the associated data 61 B, this need not be the case.
- the edited data may contain more or less data than the section of the original object it replaces.
- the fixed-content storage system is configured to determine whether to store a new object or create a differenced object based on the magnitude of the changes to the original object relative to the object's size.
- a threshold determined, for example, based on the size of the original object
- the edited object is stored as a new object.
- the changes are less than the determined threshold, then the edited object may be stored as a differenced object.
- the threshold may be that the size of the edited data must not be larger than 50% of the size of original file.
- a reference is stored to the original data object that may include metadata indicating which portions and how much of the original object is utilized by the edited object.
- a reference is stored to the edited data. Metadata may also be stored that indicates the positioning of the edited data within the original object.
- differenced objects may additionally be ‘flattened’ when the original object they reference is no longer necessary.
- the referenced data from the original object may be copied and stored in the differenced object with all of the changes, creating a new object.
- the original object may then be deleted.
- a medical study may contain a number of instances representing, for example, images captured as part of an examination.
- a user accessing the stored images may want to retrieve only one image of more than 500. If the user were forced to retrieve each image, a great deal of time and resources may be wasted. This may be accomplished using composite objects. For medical systems though, this is usually done using proprietary container files that are application-specific, or accomplished by using file-system directories as containers.
- FIGS. 5A-C show an example of a method for storing composite objects in an object-oriented fixed-content storage system.
- FIG. 5A includes data objects 80 , 85 , and 90 .
- the contents of the data objects 80 , 85 , and 90 are related, but the objects represent different file types.
- each data object used to form a composite object is of the same file type.
- a manifest data object 100 is created in order to simplify the management of data objects 80 , 85 , and 90 .
- Manifest data object 100 includes reference data 101 , which references each sub-object 80 , 85 , and 90 in the composite object 100 .
- manifest data object 100 is compliant with certain standards such as XAM so that updated API commands access the manifest object. If data is changed, only the manifest and changed data need to be updated.
- composite objects described here provide a large degree of flexibility and increase data management capabilities.
- composite objects may be managed by a single set of rules, for example stored in the metadata 102 of manifest data object 100 .
- sub-objects referenced by the manifest data object 100 include a “managed as” field within the sub-object metadata that instructs the fixed-content storage system how to manage the given sub-object when it is desired that the object not be managed according to the manifest data object 100 .
- FIG. 5C shows an embodiment of a process for generating a composite object.
- multiple objects that are to be related by the composite object are received or accessed.
- multiple objects are received from an external computing system to be stored substantially simultaneously as a composite object.
- multiple objects already stored in the fixed-content storage system are accessed in order to generate a composite object.
- a manifest object is generated.
- reference data indicating the multiple objects received or accessed at state 501 is stored in the manifest object.
- the reference data is stored as content data, rather than a metadata reference content block, in order to prevent the alteration of the manifest object in the storage system.
- one or more reference content blocks are utilized.
- FIG. 6 demonstrates a composite object referencing several data objects using many of the data management techniques discussed herein.
- manifest data object 110 references consolidated object 120 , differenced object 140 , and intelligently decomposed object 130 .
- a skilled artisan will realize that these storage management systems and methods may be combined in a variety of ways without departing from the scope of the invention.
- modules may operate as a single unit.
- a single module may comprise one or more subcomponents that are distributed throughout one or more locations.
- the communication between the modules may occur in a variety of ways, such as hardware implementations, software implementation, or a combination of hardware and software.
- the modules may be realized using state machines, microcode, microprocessors, digital signal processors, or any other appropriate digital or analog technology.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A system and method is described for managing data objects in a fixed-content storage system. In one embodiment, differenced objects are created when an object stored in a fixed-content storage system is edited. The edits to the original object may represent a small change in the original object, but because the stored original object is immutable it is not possible to simply overwrite the small portion that is edited. In order to store the edited data without requiring duplication of existing data, a new object is created that references both the original object and the edited data. The metadata of the new object includes information relating to the offset and the size of the edited data so that the edited data is accessed instead of the corresponding portion of the original object.
Description
- This application is a continuation of U.S. patent application Ser. No. 13/014,659, filed Jan. 26, 2011, which is a continuation of U.S. patent application Ser. No. 12/036,162, filed Feb. 22, 2008, now U.S. Pat. No. 7,899,850, all of which are hereby incorporated by reference.
- The present invention relates to fixed-content storage systems. In particular, the present invention relates to managing data objects in a fixed-content storage system.
- A fixed-content object is a container of digital information that, once created, remains fixed. Examples of objects that could be fixed include medical images, PDF documents, photographs, document images, static documents, financial records, e-mail, audio, and video. Altering a fixed-content object results in the creation of a new fixed-content object. A fixed-content object once stored becomes immutable.
- Fixed-content digital data is often subject to regulatory requirements for availability, confidentiality, integrity, and retention over a period of many years. As such, fixed-content data stores grow without bounds and storage of these digital assets over long periods of time presents significant logistical and economic challenges.
- To address the economic and logistical challenges associated with storing an ever growing volume of information for long periods of time, fixed-content storage systems implement a multi-tier storage hierarchy and apply Information Lifecycle Management (ILM) policies that determine the number of copies of each object, the location of each object, and the storage tier for each object. These policies will vary based on the content of each object, age of each object, and the relevance of the object to the business processes.
- A multi-site, multi-tier storage system, large scale distributed fixed-content storage is needed, for example, to address the requirement for storing multiple billions of fixed-content data objects. These systems ensure the integrity, availability, and authenticity of stored objects while ensuring the enforcement of Information Lifecycle Management and regulatory policies. Examples of regulatory policies include retention times and version control.
- Fixed-content storage systems grow as new objects are stored. This growth is accelerated by providing redundant copies of fixed-content objects in order to reduce the probability of data loss. As the size and complexity of the fixed-content storage system grow, the resources necessary to manage the storage system also increase. Improved data management techniques are therefore needed as the system scales to more efficiently store, organize, and manage data in a fixed-content storage system, while also fulfilling applicable regulations.
- In one embodiment, a data object to be stored in a distributed fixed-content storage system is intelligently decomposed along the data object's logical boundaries. Intelligently decomposed objects are compared with other reference objects and, where they are identical, one reference object is stored and referenced by a reference content block. For example, a medical study archive contains thousands of instances of a template form with minor variations. For each instance, the template is stored separately from the additional data. Intelligent decomposition of the template data and the additional data when storing the archive allows for one instance of the template data to be referenced by other objects containing reference content blocks. Thus, storage resources may be used efficiently where identical data is stored in only as many places as required by regulatory or other requirements.
- In another embodiment, multiple external data objects are consolidated into a single data object. The external data objects are accessed by reference to metadata that indicates an offset and size of the external data object. By consolidating many objects into a single object, the total number of data objects is reduced. This allows for the simplified management of the data stored in the fixed-content storage system.
- In another embodiment, differenced objects are created when an object stored in a fixed-content storage system is edited. The edits to the original object may represent a small change in the original object, but because the stored original object is immutable it is not possible to simply overwrite the small portion that is edited. In order to store the edited data without requiring duplication of existing data, a new object is created that references both the original object and the edited data. The metadata of the new object includes information relating to the offset and the size of the edited data so that the edited data is accessed instead of the corresponding portion of the original object.
- In yet another embodiment, composite objects are provided that reference multiple objects. A manifest data object is created that references each object, and accessing the manifest data object allows for the identification, access, and management of objects joined in the composite object.
-
FIG. 1 illustrates various nodes in a distributed storage system. -
FIG. 2 illustrates an embodiment of a fixed-content storage subsystem that comprises multiple data objects. -
FIGS. 2A-E illustrate a method of intelligent decomposition and storage of content. -
FIGS. 3A-C illustrate a method of object consolidation and storage of content. -
FIGS. 4A-C illustrate a method of storing content as a differenced object. -
FIGS. 5A-C illustrate a method of storing content as a composite object. -
FIG. 6 illustrates a composite object utilizing various storage methods. - Continued adoption of digital technology in nearly all sectors including healthcare, media, government, and financial services is accelerating the creation of fixed-content data. Regulatory and business requirements for retention are resulting in the continued growth of data that must be stored and managed. In many sectors, the retention times exceed the practical lifetime of the storage media, and long term data archiving is an ongoing business challenge. As the archives grow, scaling limitations arise due to the size of the stored data as well as the number of fixed content objects that need to be stored and managed. There is a market demand for fixed-content storage systems that can intelligently manage fixed-content data to provide for more efficient scaling.
- Fixed-content storage involves the storage and management of data such that once stored, the data is immutable—it cannot be changed. Thus, locks are not required for alterations to the contents of the object. However, despite the object itself being immutable, additional objects may be stored that consist of minor variations of an existing object and many objects may have large amounts of identical data. Efficiency is provided according to certain embodiments by recognizing where these minor variations and duplicate data exist. Rather than providing more copies of any particular data than necessary, metadata is configured to provide references to data objects containing the data. Additionally, object management may be simplified by reducing the total number of objects or providing a single object that allows access to and management of additional objects.
- As illustrated in
FIG. 1 , a typical fixed-content storage system deployment may involve multiple nodes, often spanning multiple geographically separated sites. When a request for information is made, thestorage grid 200 may serve that request based on the location of the data, the location of the user, the load on the system, and the state of the network. This balances the load on the network, storage and servers in order to minimize bandwidth usage and increase performance. Thestorage grid 200 is a unified structure, but there may be multiple servers or repositories of content or metadata. - Nodes may be grouped based on the services they provide. For example,
storage nodes - The
storage resource - Storage nodes can accept data and process retrieval requests, and information input into a storage node can be retrieved from other storage nodes. Storage nodes may process client protocol requests and include support for DICOM, HTTP and RTP/RTSP. Support for NFS/CIFS may be provided, for example, through gateway nodes.
- Storage nodes may replicate and cache data across multiple sites and multiple nodes. Data replication is based on a set of configurable rules that are applied to the object metadata and may take into account geographic separation of nodes as well as the bandwidth between nodes. The logic that governs replication and distribution may be enforced by control nodes.
-
Gateway nodes 228 provide an interface through whichexternal applications 220 may communicate with the storage grid.Gateway nodes 228 route incoming requests to storage nodes based on, for example, the available CPU, bandwidth, storage and geographic proximately. For applications that require direct file system access, thegateway nodes 228 may provide a NFS/CIFS interface to the storage grid. -
Control nodes 238 may consist of separate software services, such as the Content Metadata Service (CMS) and the Administrative Domain Controller (ADC). Although these services can run on separate computing resources, they may also share a single server. The Content Metadata Service constitutes a distributed business rules engine that provides for content metadata storage, metadata synchronization, metadata query and enforcement of replication and information lifecycle management business logic. Replication and information lifecycle management policies may be based on metadata that is associated with stored objects. This allows the creation of business rules that determine where content is stored, how many copies are stored, and on what media it is stored on throughout its lifecycle. A Content Metadata Service may interface, for example, with a local SQL database through a database abstraction layer. - The Administrative Domain Controller acts as a trusted authentication repository for node-to-node communication. It also provides knowledge of system topology and information to optimize real-time usage of bandwidth, CPU and storage resources. This allows automated management of computational resources and dynamic load balancing of requests based on the available CPU, storage and bandwidth resources.
- The
Administration Node 234 may consist of software components such as the Network Management Service and the Audit Service. These services may share a common computing resource, or they may be run on separate computing resources. Amanagement interface 226 may be used to monitor and manage the operational status of the grid and associated services. - The Audit Service provides for the secure and reliable delivery and storage of audited events corresponding to content transactions across the entire storage grid. Audit events are generated, in real-time, by Storage Nodes and Control Nodes. Events are then relayed through the storage grid using a reliable transport mechanism and delivered to the Administration Nodes. Audit messages are processed by the Audit Service and may be directed to an external database or file.
- The Network Management Service collects and processes real-time metrics on utilization of computing, storage and bandwidth resources. It provides real-time and historical usage reports. In addition it is responsible for fault reporting and configuration management.
- The
Archive Node library 246 for the archiving and retrieval of grid managed objects. Archive nodes may be added to diversify archive pools and to provide archival storage at multiple sites. Thestorage grid 200 may also utilize external storage resources, such as a managedtape library 222 or anenterprise SAN 224. - Storage Nodes and Control Nodes in the storage grid can be upgraded, decommissioned, replaced or temporarily disconnected without any disruption. Nodes do not need to run on the same hardware or have the same storage capacity. Nodes replicate and cache data across multiple sites and multiple nodes. In addition to bandwidth savings, the intelligent distribution of information provides for real-time backup, automated disaster recovery and increased reliability.
- Capacity, performance and geographic footprint of the storage grid can be increased by adding nodes as needed, when needed, without impacting end-users. This enables the storage grid to accommodate thousands of terabytes of data across hundreds of locations. The storage grid combines the power of multiple computers to achieve extremely high levels of scalability and throughput. As nodes are added to the storage grid, they contribute to the available computational and storage resources. These resources are seamlessly utilized based on bandwidth availability and geographical suitability.
- In traditional archives, information is stored as files, and access to data is gained through a path pointer stored in an external database. When storage scales, old storage is replaced, or is offline, this results in broken pointers and unavailable data. In order to scale, costly and disruptive migration procedures are required. Furthermore, it is difficult to operate in heterogeneous environments and multi-site deployments. This is because the approach relies on the underlying file system and network file system protocols.
- Within the storage grid, data are stored and referenced as objects. An object can be one file or a collection of files with relationships that are defined by object metadata. Object metadata constitutes application specific information that is associated with a data object. This information can be attached to or extracted from the object at the time of input into the storage grid. Object metadata can be queried and the storage grid can enforce business rules based on this information. This allows for efficient utilization of storage/bandwidth resources, and enforcement of storage management policies.
- In this object oriented architecture, external applications no longer use pointers to a path, but a universal handle to an object. This enables high levels of reliability, scalability and efficient data management without the need for disruptive migration processes. Multiple object classes can be defined and for each object class, there are specific business rules that determine the storage management strategy.
- In this embodiment, the storage grid is fault tolerant, resilient and self-healing. Transactions continue to be processed even after multiple hardware, storage and network failures. The design philosophy is that hardware, network, and catastrophic failures will occur, and the system should be able to deal with faults in an automated manner without impacting the stored data or end-users.
- Reliability is achieved through replicas, which are identical copies of objects (both data and metadata) that are stored on multiple nodes and kept synchronized. Increasing reliability involves adding nodes to the storage grid and increasing the number of replicas for each object. The location and number of the replicas is based on a set of rules that can be configured to ensure geographical separation and the desired level of redundancy. The storage grid will automatically enforce this logic across all nodes. If a failure is detected, the system is self-healing in that additional replicas are automatically created to restore the level of resiliency.
- As nodes are added, removed or replaced, the system manages the available storage. Incoming data is transparently re-directed to the take advantage of the newly added storage capacity. Within the storage grid objects are redistributed, purged, or replicated based on metadata and policies that are applied to the metadata. Objects can also migrate from one storage grade (e.g., disk) to another (e.g., tape) not simply based on time and date stamps, but external metadata that indicates the importance of the object to the specific business application. For example in medical applications, certain imaging exams may be immediately committed to deep storage. In applications for the financial sector, retention policies may be set up to facilitate compliance with regulatory requirements for data retention.
- Users may input and retrieve data from the location within the storage grid that is closest to them, thereby efficiently utilizing bandwidth and reducing latency. In addition, as information is requested, it may be cached at the requesting Storage Node to enable improved bandwidth efficiency.
- Obsolete components can be removed without impacting services or endangering stability and reliability. A Storage Node may be decommissioned through the administrative console. When this takes place, the storage grid may automatically redirect requests to alternate nodes. Furthermore, the storage grid may transparently re-distribute the stored data on other suitable Storage Nodes. This allows for seamless removal of obsolete hardware without any disruptions to storage grid operations. This is in contrast to disruptive data migration procedures that are common in many fixed content applications. Operators can eliminate support for obsolete hardware while taking advantage of the economic benefits of decreasing costs of storage and increases in processing power. Each newly added node costs less and provides more processing power and storage capacity.
- When data and metadata are stored into the storage grid, the data and metadata is packaged into an object. Objects consist of data and associated metadata that are managed as an unalterable and atomic entity. Once stored, these objects are actively managed throughout their information lifecycle. When an object is retrieved, the original data and associated metadata is presented for use. This provides a transparent storage service to external entities.
- Each object stored may have a unique identifier that acts as the primary identifier for the object. This identifier may be assigned at the time the object is created. Objects can be moved from one object store to another.
- Objects stored within the grid may contain metadata, which is used to manage the objects over their lifecycle and facilitate access to the objects. Object metadata may include, for example, Content Block metadata, Protocol metadata, Content metadata, User metadata, or Management metadata.
- Content Block metadata may be metadata associated with the object creation process itself, and provides information about the packaging and protection of the user provided data and metadata. An example of this type of metadata is the size of the data stored in a given object.
- Protocol metadata may be metadata associated with the protocol used to store the object, but not intrinsic to the data within the object. This includes metadata required to perform protocol specific transactions. For data stored through the DICOM protocol, an example of this type of metadata is the DICOM AE title of the entity that stored the data.
- Content metadata may include metadata contained within recognized types of content. If so processed, metadata specific to each recognized type of content is extracted from the content. For content of type PDF, an example of this type of metadata is the number of pages in a document.
- User metadata may include arbitrary metadata specified by the entity storing content into the grid. This ability to attach user metadata is limited by the protocol used to store the objects. An example of this type of metadata is a private identifier assigned by the user.
- Management metadata consists of metadata generated and modified over time as objects are managed within the grid. Unlike the previous four classes of metadata, this metadata is not immutable, and is not present as part of the object itself. An example of this type of metadata is the time when an object was last accessed.
- Each time a new object is stored, the metadata associated with the object is also stored in a separate subsystem that maintains a repository of metadata. The metadata store can be queried to return the metadata associated with a given object. Queries can also be performed to return a list of objects and requested metadata for all objects that have metadata that matches a specific query.
- Placement of objects may be based on the capabilities of the storage grid computing resources. Different computing resources have different capacity to perform work. While this is primarily measured based on the clock frequency of the processor, the number of processors and relative efficiencies of different processor families may also be taken into account. In addition, the amount of CPU resources that are currently in use provides a mechanism to determine how “busy” a given resource is. These characteristics are monitored and measured to allow decisions to be made within the grid about which computing resource is best suited to use to perform a given task.
- Placement of objects may also be based on the characteristics of the storage resources, such as storage latency, reliability, and cost. Storage capacity provides information for calculating risk in the event of rebuild. A measurement of the amount of storage capacity that is currently in use provides a mechanism to determine how full a given storage resource is, and determine which locations are more able to handle the storage or migration of new content. Different storage resources have different throughput. For example, high performance Fiber-Channel RAID systems will deliver better performance then a lower performance software RAID on IDE drives. A measurement of the amount of I/O bandwidth that is currently in use provides a mechanism to determine the extent to which a given storage resource is able to handle additional transactions, and how much it will slow down current transactions. Storage resources can be read-only, and thus not a candidate for the storage of new objects. These characteristics may be monitored and measured to allow decisions to be made within the grid about which storage resource is best suited to use to retain objects over time, and influence the rules that determine where objects should be stored.
- Placement of objects may also consider the characteristics of network paths, such as latency, reliability and cost. Different network paths have different amounts of bandwidth available. This directly maps into the time required to transfer objects from one storage repository to another. The amount of the network bandwidth that is currently in use may also be considered. This provides a mechanism to determine how “busy” a given network link is, and to compare the expected performance as compared to the theoretical performance. These characteristics may be monitored and measured to allow decisions to be made within the grid about which network path is best suited to use to transfer objects through the grid.
- When objects are stored in multiple different locations, the probability of data loss is reduced. By taking common-mode failure relationships and fault probability information into account, the probability of data loss and data inaccessibility for a given placement of objects can be quantified and reduced to manageable levels based on the value of the data in question.
- To avoid common mode failures, replicas of objects can be placed in separate failure zones. For example, two replicas created within a single server room can take into account that storage on nodes that do not share a single UPS has a higher probability of accessibility then two replicas stored on two nodes that share the same UPS. On a larger scale, two replicas created in geographically distant locations have a lower probability of loss then two nodes within the same facility.
- As replica placement rules are metadata driven, they can be influenced by external systems and can change over time. Changes to existing replicas and changes to the topology of the grid can also influence replica placement rules.
- Replica placement can reflect the instantaneous, historical and predictive information associated with a given resource. For example, monitoring of server and storage health can dynamically influence the degree of reliability attributed to a given resource. Different types of storage resources, such as IDE vs. SCSI, have different reliability characteristics. In addition, archival and offline storage often have a distinct media lifetime, which need to be managed to preserve archive integrity. These are both examples of the use of information about available resources is used to determine the best solution for a given set of constraints.
- Implementation of configuration information based on formal risk analysis can further optimize the resource tradeoff by providing information about common mode failures that cannot be automatically discovered by the grid. For example, the placement of two replicas on nodes situated along the same fault line may be considered to be within a common failure mode, and thus suboptimal when compared to the placement of one of the replica in a facility not located on the fault.
- The use of external data feeds can provide valuable information about changes in the reliability of a given failure zone. In one scenario, a live feed from the weather monitoring system can provide advance notice of extreme weather events, which could allow the grid to dynamically rebalance content to reduce the risks associated with the loss of connectivity to a given facility.
- Content stored in a fixed-content storage system can be, but is not limited to, audio, video, data, graphics, text and multimedia information. The content is preferably transmitted via a distribution system which can be a communications network including, but not limited to, direct network connections, server-based environments, telephone networks, the Internet, intranets, local area networks (LAN), wide area networks (WAN), the WWW or other webs, transfers of content via storage devices, coaxial cable, power distribution lines (e.g., either residential or commercial power lines), fiber optics, among other paths (e.g., physical paths and wireless paths). For example, content can be sent via satellite or other wireless path, as well as wireline communications networks, or on the same path as a unit of power provided by a utility company.
- According to some embodiments, novel data structures are utilized in order to allow certain features described herein. Objects stored within the storage system are stored as one or more packets. Each packet includes a certain non-zero amount of packet metadata and zero or more bytes of payload data. In a preferred embodiment, the quantity of packet metadata and the quantity of payload data vary among different packets. A maximum packet size or quantity of payload data may be utilized. For example, the maximum quantity of payload data in a variable size packet may be configured to be 16 KB. Each packet may include a predetermined identical amount of packet metadata and payload data in some embodiments.
- The packet metadata may contain information allowing for the processing of variable sized packets when the amount of packet metadata and payload data is not predefined. Types of packet metadata include offset data, packet size data, and the like. This packet metadata may allow for the arbitrary retrieval of data in an object by identifying a specific packet or bytes within or across one or more packets.
-
FIG. 2 shows an embodiment of a fixed-content storage subsystem 700 that comprises multiple data objects. The data objects comprisemetadata 701 andpayload data 702. Furthermore, the fixed-content storage system 700 is accessible by aremote server 720. - As shown in
FIG. 2 , one or more packets may comprise reference content blocks 710 and/or floating reference content blocks 705 according to some embodiments. Areference content block 710 preferably has only packet metadata that refers to a different packet or content block, and does not contain any payload data. The packet metadata reference may cause an application accessing the reference content block to access some other packet(s) in place of the reference content block. For example, with a video file stored in a fixed-content storage system, a reference content block may be stored rather than another short video (such as a geographically specific clip). The reference content block may refer to that short clip stored separately, either in the fixed-content system or in another storage system. - A floating
reference content block 705 is a reference content block that does not yet point to a packet or reference content block. Unlike reference content blocks 710, which are resolved at the storage system 700 (for example, by referring to a logical or physical memory address, or by referring to a particular object or instance), floating reference content blocks 705 are resolved at aserver 720 or computing system outside the fixed-content storage system when the data is accessed. The packet metadata associated with the floatingreference content block 705 specifies the size, duration, and/or other information that enables theserver 720 to resolve the floatingreference content block 705. Accordingly, an object comprising one or more packets may reference other objects or portions of other objects within thestorage system 700. According to some embodiments and as shown inFIG. 2 , aserver 720 resolving a floatingreference content block 705 may also resolve the storage location to an external storage system 730. - With floating reference content blocks, an object may reference variable data within the storage system. Though the data written to the fixed-
content storage system 700 is not altered, floating reference content blocks 705 allow for the modification of an object as seen by an external user accessing thestorage system 700. Floating reference content blocks may therefore be a powerful tool when used with a fixed-content storage system as described herein. - For example, if a medical report/form template is stored in a fixed-content storage system, there may be a number of blank fields. For each patient having a report stored, the values of these fields may be different, but the template is largely the same. If these fields are stored as floating reference content blocks, then the patient data may be stored separately for each patient, without duplicating the template data. When the data is accessed, for example by a medical professional, they may request information on one of the patients. The template would be loaded, and based on the patient information requested, the medical professional's computing system can resolve the floating reference content blocks in order to access the specific patient data requested along with the report form.
- Floating reference content blocks may be resolved according to any criteria appropriate to the particular file. For example, a floating reference content block may be resolved based on the geographic location of the computing system accessing the data, an IP address, data submitted by the computing system, or the like.
- The metadata in a reference content block or a floating reference content block can override some of the metadata in a packet (or group of packets) that is pointed to. This may allow certain data stored in the fixed-content storage system to be treated differently according to how it is accessed. This in turn may allow for objects to be stored once rather than requiring near identical copies, as the data is immutable. By changing the management rules of the fixed-content storage system, more flexibility is obtained without modifying the protected data. Several embodiments of operations performed using reference content blocks and floating reference content blocks will be described in more detail below.
-
FIGS. 2A-E demonstrate a method for intelligently decomposing data stored in a fixed-content storage system according to one embodiment. Intelligent decomposition stores data objects according to their logical boundaries and allows for single instance storage of objects or portions of objects that may be identical. For example, in some systems multiple instances of similar data are stored, where the difference is the payload within a well-known structure, such as a TAR archive. A TAR archive is the concatenation of one or more files. -
FIG. 2A shows one embodiment of an implementation of intelligent decomposition data management techniques with reference to aTAR archive 10 for a medical system storing, for example, cardiology and radiology images. Other embodiments utilize other data file types having known boundaries. The TAR archive includes twoarchived files archived file header block TAR headers - As discussed previously, objects such as a TAR archive may be stored in one or more packets. For example,
FIG. 2B illustrates partitioning of theTAR archive 10 into fivepackets packets packets packet 24 contains data corresponding to file 12,header block 18, and file 14. There is no alignment of the TAR headers, and no references to data in external objects. -
FIG. 2C illustrates the partitioning of theTAR archive 10 by using the file boundaries and the alignment of TAR headers.TAR header 16 is placed inpacket 30,archived file 12 is placed inpackets TAR header 18 is placed inpacket 36, andarchived file 14 is placed inpackets TAR archive 10 was partitioned along the TAR archive header and file boundaries, each of the TAR archive headers and files can be handled separately. -
FIG. 2D illustrates an exemplary embodiment for storing the partitions fromFIG. 2C as multiple objects. Amaster object 42 corresponds to theTAR archive 10. Themaster object 42 includes a component for each of the two files in the TAR archive. The first component includesmetadata packet 25A, TAR file header packet 30 (fromFIG. 2C ), andreference block 27A. The second component includes metadata packet 25B, TAR file header packet 36 (fromFIG. 2C ), andreference block 27B. -
Reference block 27A provides a reference to areference object 46.Reference object 46 includespartitions first file 12 in theTAR archive 10, andpacket metadata 25D and 25E.Reference block 27B provides a reference to areference object 48.Reference object 48 includespartitions second file 14 in theTAR archive 10, andpacket metadata archived file -
FIG. 2D also includes asecond master object 44.Master object 44 includes apacket 31 corresponding to a third header. In this example, the third header is found in a TAR archive that also contains thefirst data file 12. Rather than storing an additional reference object representing a duplicate copy of thereference object 46, thereference content block 27C references the existing storedreference object 46. By reducing the required storage of duplicate objects, the total amount of storage resources required by the fixed-content storage subsystem may be reduced. - Although the example shown in
FIGS. 2A-2D relates to a TAR file, a similar procedure could be applied to other file types. In one example, a media file may contain a series of media clips, and each media clip could be treated as an object. In another example, a pdf file may contain pages or other content that could be treated as separate objects. - One embodiment of a process for intelligently decomposing objects stored to a fixed-content storage system is shown in
FIG. 2E . The process begins atstate 201 where an object to be stored is received. The object received is preferably of a type having a well known file structure so that it can be decomposed or packetized atstate 202 along its logical boundaries. For example, header data may be separated from payload data. - The decomposed object is thus broken into separate portions, each of which may comprise one or more packets. One of the portions is selected at
state 203, and atdecision state 204 it is determined if the selected portion is identical to an existing stored reference object. The existing object may comprise any other object, but is likely to be a reference object related to the current object being stored. For example, if the current object being stored is an instance of a medical study, then existing instances of the study may be identified based on metadata or additional data from the external system providing the object. If the portion already exists as a reference object, then the existing object is referenced by a reference content block atstate 205. If the portion does not already exist in the storage system, then the decomposed object portion is stored atstate 206. Atdecision state 207 it is determined whether the entire received object has been stored or referenced. If any portion remains, then the process returns tostate 203. When all portions have been handled, then a master object exists in the storage system for the received object that references existing data as well as any new data. Thus, this process may advantageously be used in a fixed-content storage system in order to allow greater flexibility and reduce the need for increased storage space. - In one embodiment, the decomposed object portion is stored prior to identification of existing instances of the object. After it is determined that equivalent content to the decomposed object portion is stored in another object, the identifier for the decomposed object portion may be repointed to the other object. The stored decomposed object portion may then be removed.
-
FIGS. 3A-C show a method of object consolidation for a fixed-content storage system. For multiple data objects representing individual instances of a particular group, it may be inefficient to store each instance as a separate object. Even when identical data is handled efficiently, the management of a large number of objects may create inefficiencies in object management. - As an example, a data object representing an advertisement is created for distribution and display in a variety of geographical areas. The advertisement data object may be configured to reference a large number of additional data objects (e.g., endings), with each of the additional data objects corresponding to one of the geographical areas. Rather than storing a separate data object including the advertisement data object for each additional data object or storing the advertisement data object once and storing each of the additional data objects separately, a single object may be created with each of the additional data objects stored back-to-back. When the advertisement object is accessed, a floating reference content block resolves to a different offset based on the geographic location. Thus, for 200 different regions, rather than storing a relatively large advertisement and 200 relatively short endings as 201 objects, the endings are stored back-to-back so that a single object is created including the advertisement and all of the endings. The cost of managing many small objects for different applications, sometimes having tens of thousands or more individual instances, can be quite large. Storing the small objects as a single object allows for random access retrieval while reducing the number of objects required, thus making storage management more cost effective.
- As another example, a data object representing a medical study may include thousands of individual cases or instances. The cost of managing many small objects can be large from a licensing or hardware standpoint. Consolidating the cases or instances reduces the number of objects required. The individual cases or instances would still be accessible using offsets for random-access.
-
FIG. 3A shows an example of object consolidation of two external data objects 51 and 52 according to one embodiment. The external data objects 51 and 52 may be any type of data object, such as media files, medical storage files, or the like. For example, external data object 51 may represent a first file of a medical study to be stored, and external data object 52 may represent an additional instance of the study. In another embodiment, the external data objects 51 and 52 are files that were originally stored in the same folder. - Rather than store external data objects 51 and 52 as separate objects, they may be stored as a single consolidated data object 50 as shown in
FIG. 3B . Data object 50 comprisesmetadata Metadata FIGS. 3A and 3B show only two external data objects consolidated to form data object 50, in some embodiments a different number of external data objects are consolidated. As the number of external objects increases, object consolidation as described herein provides additional efficiency in managing the objects in a fixed-content storage system. -
FIG. 3C shows a process for creating a consolidated data object. Atstate 301 multiple objects are received or accessed. In some embodiments, these objects are accessed and consolidated from within a storage system. In some embodiments, multiple objects are received from an external computing system to be stored, and every object to be consolidated is received in a single data transfer. In some embodiments, one or more new objects to be consolidated with existing stored data are received. - At
state 302, metadata is generated for the consolidated object that indicates an offset and size for the received data objects. For example, the metadata may indicate that a first data object stored in a consolidated data object may have no offset and be 64 KB, while the second data object may have a 64 KB offset and be 32 KB. - At
state 303, the multiple received objects are stored back-to-back as a single object. Any reference to the multiple received objects can be handled by the consolidated object that will reference each of the received objects by offset. Accordingly, management of many related objects may be simplified and costs reduced because a smaller number of objects are stored in the storage system. - Because data in fixed-content storage systems is immutable, small changes made to large files may be handled inefficiently by traditional systems. For example, a large database containing approximately 50 GB of data is stored as an object in a fixed-content storage system. An edit to that database is made by a user that comprises approximately 100 KB of changed data. The originally stored object cannot be modified with these changes in the fixed-content storage system, as the stored data may not be edited. In traditional fixed-content storage systems, even though the vast majority of the data from the original object has not been changed, a new object must be stored including the more than 49 GB that remains identical.
- Medical data may include an image and corresponding demographic data. The size of the image is much larger than the corresponding demographic data. Thus, a 50 MB image may be updated to write 32 bytes worth of patient name information.
-
FIGS. 4A-C show an example of a method for generating and storing a differenced object in a fixed-content storage system to more efficiently handle such changes according to one embodiment.FIG. 4A shows anoriginal data object 60 and an editeddata object 65 as stored in a traditional fixed-content storage system.Original object 60 comprisesmetadata 71 andpayload data 61A-C. For example, the original data object 60 may be a 50 MB radiology image along with a relatively small amount of associateddata 61B that represents patient name, demographic data, and the like. The associateddata 61B may represent, for example 32 bytes of 50 MB data object 60. When a change is made to the associateddata 61B, a typical fixed-content system may store the edited object as a new data object 65 that includes most of the data from theoriginal data object 60, but has replaced the associateddata 61B with the editeddata 66. - Rather than storing, as shown in
FIG. 4A , theoriginal object 60 and aseparate object 65 containing the entire original object with the editeddata 66, FIG. 4B shows a method for storing a differenced object including essentially only the changes.FIG. 5B showsoriginal object 60 comprisingpacket metadata 71 andpayload data 61A-C. An edit represented bydata 66 has again been made to the associateddata 61B representing a small portion of theoriginal object 60. A differencedobject 70 is created as the edited object.Differenced object 70 comprisesreference content block 72A.Reference content block 72A references theoriginal object 60 so that the data shared by the editedobject 65 and theoriginal object 60 may be accessed by differencedobject 70 without storing additional copies of the data.Reference content block 72A further references anobject including metadata 71, editeddata 66, andreference content block 72B. Thereference content block 72A and thereference content block 72B may indicate the location or offset where associateddata 61B of theoriginal object 60 is to be replaced by editeddata 66 when the edited and differencedobject 70 is accessed, the size of the editeddata 66, the size of the associateddata 61B, and the like. Referencing the identical data from theoriginal object 60 allowsoriginal object 60 to be maintained as a fixed-content object, while small changes are efficiently stored to create additional instances of edited objects. -
FIG. 4C is a flowchart indicating one embodiment of a process for generating a differenced object. Atstate 401, an edited object is received. Next, atstate 402, the edited object is compared to the original object. In the example shown inFIGS. 4A and 4B , associateddata 61B is shown as the payload data from one packet. However, in some embodiments edits may comprise only a portion of the payload data from a packet or may comprise multiple packets or portions thereof. Furthermore, although editeddata 66 is shown inFIGS. 4A and 4B as containing the same quantity of data as the associateddata 61B, this need not be the case. In some embodiments, the edited data may contain more or less data than the section of the original object it replaces. - In some embodiments, the fixed-content storage system is configured to determine whether to store a new object or create a differenced object based on the magnitude of the changes to the original object relative to the object's size. When the changes are larger than a threshold determined, for example, based on the size of the original object, the edited object is stored as a new object. When the changes are less than the determined threshold, then the edited object may be stored as a differenced object. For example, the threshold may be that the size of the edited data must not be larger than 50% of the size of original file.
- After the edited portions have been determined (and are determined to be small relative to the original object in some embodiments), then at state 403 a reference is stored to the original data object that may include metadata indicating which portions and how much of the original object is utilized by the edited object. At
state 404, a reference is stored to the edited data. Metadata may also be stored that indicates the positioning of the edited data within the original object. - In some embodiments, differenced objects may additionally be ‘flattened’ when the original object they reference is no longer necessary. The referenced data from the original object may be copied and stored in the differenced object with all of the changes, creating a new object. The original object may then be deleted.
- In order to realize certain advanced applications it may be desirable that several objects be grouped within a single container as a composite object. The objects may therefore be managed according to a single set of rules. For example, a medical study may contain a number of instances representing, for example, images captured as part of an examination. A user accessing the stored images may want to retrieve only one image of more than 500. If the user were forced to retrieve each image, a great deal of time and resources may be wasted. This may be accomplished using composite objects. For medical systems though, this is usually done using proprietary container files that are application-specific, or accomplished by using file-system directories as containers.
-
FIGS. 5A-C show an example of a method for storing composite objects in an object-oriented fixed-content storage system.FIG. 5A includes data objects 80, 85, and 90. In some embodiments, the contents of the data objects 80, 85, and 90 are related, but the objects represent different file types. In some embodiments, each data object used to form a composite object is of the same file type. - As shown in the embodiment of
FIG. 5B , amanifest data object 100 is created in order to simplify the management of data objects 80, 85, and 90. Manifest data object 100 includesreference data 101, which references each sub-object 80, 85, and 90 in thecomposite object 100. In some embodiments, manifest data object 100 is compliant with certain standards such as XAM so that updated API commands access the manifest object. If data is changed, only the manifest and changed data need to be updated. Thus, composite objects described here provide a large degree of flexibility and increase data management capabilities. - In some embodiments, composite objects may be managed by a single set of rules, for example stored in the
metadata 102 of manifest data object 100. In some embodiments, sub-objects referenced by the manifest data object 100 include a “managed as” field within the sub-object metadata that instructs the fixed-content storage system how to manage the given sub-object when it is desired that the object not be managed according to themanifest data object 100. -
FIG. 5C shows an embodiment of a process for generating a composite object. Atstate 501, multiple objects that are to be related by the composite object are received or accessed. In some embodiments, multiple objects are received from an external computing system to be stored substantially simultaneously as a composite object. In some embodiments, multiple objects already stored in the fixed-content storage system are accessed in order to generate a composite object. - At
state 502, a manifest object is generated. Atstate 503, reference data indicating the multiple objects received or accessed atstate 501 is stored in the manifest object. In a preferred embodiment, the reference data is stored as content data, rather than a metadata reference content block, in order to prevent the alteration of the manifest object in the storage system. In some embodiments, one or more reference content blocks are utilized. -
FIG. 6 demonstrates a composite object referencing several data objects using many of the data management techniques discussed herein. In the embodiment shown, manifest data object 110 referencesconsolidated object 120, differenced object 140, and intelligently decomposedobject 130. A skilled artisan will realize that these storage management systems and methods may be combined in a variety of ways without departing from the scope of the invention. - The high-level overview illustrated in the figures partitions the functionality of the overall system into modules for ease of explanation. It is to be understood, however, that one or more modules may operate as a single unit. Conversely, a single module may comprise one or more subcomponents that are distributed throughout one or more locations. Further, the communication between the modules may occur in a variety of ways, such as hardware implementations, software implementation, or a combination of hardware and software. Further, the modules may be realized using state machines, microcode, microprocessors, digital signal processors, or any other appropriate digital or analog technology.
- It should be understood that the methods and systems described herein may be implemented in a variety of ways. Methods described herein may utilize other steps or omit certain steps. Other embodiments that are apparent to those of ordinary skill in the art, including embodiments which do not provide all of the benefits and features set forth herein, are also within the scope of the invention. For example, intelligent decomposition may be used to store objects even where multiple copies of objects are required according to lifecycle management policies or regulations. While some of the embodiments described herein provide specific details for implementation, the scope of the disclosure is intended to be broad and not limited to the specific embodiments described. Accordingly, details described in the specification should not be construed as limitations of the claimed invention. Rather, the scope of the claims should be ascertained from the language of the claims, which use terms consistent with their plain and ordinary meaning.
Claims (20)
1. A method of reducing duplicative storage of data on a fixed-content storage system, the method comprising:
maintaining, on a fixed-content storage system comprising at least one computer-readable storage device, a first data object;
receiving a second data object to be stored on the fixed-content storage system;
identifying, using at least one computer processor, a first portion of the second data object and a second portion of the second data object, wherein the first portion of the second data object comprises data identical to a portion of the first data object;
constructing a differenced object using the at least one computer processor, wherein the differenced object comprises the second portion of the second data object, wherein the differenced object further comprises a reference to the portion of the first data object that is identical to the first portion of the second data object; and
storing the differenced object on the fixed-content storage system.
2. The method of claim 1 , wherein the differenced object further comprises metadata configured to enable reconstruction of the contents of the second data object, and whereby the differenced object does not include the first portion of the second object.
3. The method of claim 1 , further comprising calculating a storage size associated with differences between the first object and the second object, wherein the differenced object is stored on the fixed-content storage system based at least on a determination that the calculated storage size satisfies a specified requirement.
4. The method of claim 1 , wherein the fixed-content storage system comprises a plurality of distributed nodes, each distributed node comprising at least one processor and at least one storage device, and wherein the one or more computer processors are configured to store the consolidated data object on more than one distributed node.
5. The method of claim 4 , wherein the plurality of distributed nodes spans multiple geographically separated sites, and wherein at least a portion of the plurality of distributed nodes are configured to communicate on a network.
6. The method of claim 1 , wherein the differenced object is associated with one or more rules relating to data retention and replication, the fixed-content storage system being configured to retain and replicate the differenced object in accordance with the one or more rules.
7. The method of claim 6 , wherein the determination that the calculated storage size satisfies a specified requirement comprises a determination that the calculated storage size meets a threshold level in relation to a size associated with the first object.
8. The method of claim 1 , further comprising:
determining that the first object may be removed from the fixed-content storage system;
constructing a flattened object based on the differenced object by combining the second portion of the second object and the portion of the first object that is identical to the first portion of the second object, wherein the flattened object does not include a reference to the first object;
storing the flattened object on the fixed-content storage system; and
removing the first object from the fixed-content storage system.
9. A computing system configured to store data objects, the computing system comprising:
a fixed-content storage system comprising one or more computer-readable storage devices; and
one or more computer processors in communication with the fixed-content storage system;
the fixed-content storage system configured to maintain a first data object;
the one or more computer processors configured to receive a second data object to be stored on the fixed-content storage system;
the one or more processors configured to identify a first portion of the second data object and a second portion of the second data object, wherein the first portion of the second data object comprises data identical to a portion of the first data object;
the one or more processors configured to construct a differenced object using the at least one computer processor, wherein the differenced object comprises the second portion of the second data object, wherein the differenced object further comprises a reference to the portion of the first data object that is identical to the first portion of the second data object; and
the one or more processors configured to store the differenced object on the fixed-content storage system.
10. The computing system of claim 9 , wherein the differenced object further comprises metadata configured to enable reconstruction of the contents of the second data object, whereby the differenced object does not include the first portion of the second object.
11. The computing system of claim 9 , wherein the one or more processors configured to calculate a storage size associated with differences between the first object and the second object, wherein the one or more processors are configured to store the differenced object on the fixed-content storage system based at least on a determination that the calculated storage size satisfies a specified requirement.
12. The computing system of claim 11 , wherein the determination that the calculated storage size satisfies a specified requirement comprises a determination that the calculated storage size meets a threshold level in relation to a size associated with the first object.
13. The computing system of claim 9 , wherein the one or more processors are further configured to:
determine that the first object may be removed from the fixed-content storage system;
construct a flattened object based on the differenced object by combining the second portion of the second object and the portion of the first object that is identical to the first portion of the second object, wherein the flattened object does not include a reference to the first object;
store the flattened object on the fixed-content storage system; and
remove the first object from the fixed-content storage system.
14. The computing system of claim 9 , wherein the fixed-content storage system comprises a plurality of distributed nodes, each distributed node comprising at least one processor and at least one storage device, and wherein the one or more computer processors are configured to store the consolidated data object on more than one distributed node.
15. The computing system of claim 14 , wherein the plurality of distributed nodes spans multiple geographically separated sites, and wherein at least a portion of the plurality of distributed nodes are configured to communicate on a network.
16. The computing system of claim 9 , wherein the differenced object is associated with one or more rules relating to data retention and replication, the fixed-content storage system being configured to retain and replicate the differenced object in accordance with the one or more rules.
17. A non-transitory computer-readable medium having stored thereon a plurality of executable instructions configured to be executed on a fixed-content storage system having stored thereon a first data object, the executable instructions configured to cause the fixed-content storage system to perform operations comprising:
receiving a second data object;
identifying a first portion of the second data object and a second portion of the second data object, wherein the first portion of the second data object corresponds to a portion of the first data object;
constructing a third object using the at least one computer processor, wherein the third object comprises the second portion of the second data object, wherein the third object further comprises a reference to the portion of the first data object that corresponds to the first portion of the second data object; and
storing the third object on the fixed-content storage system.
18. The non-transitory computer-readable medium of claim 17 , wherein the third object is further configured to enable reconstruction of the contents of the second data object.
19. The non-transitory computer-readable medium of claim 17 , wherein the executable instructions are further configured to be executed on a plurality of distributed nodes, each distributed node comprising at least one processor and at least one storage device, and wherein the one or more computer processors are configured to store the consolidated data object on more than one distributed node, wherein the plurality of distributed nodes spans multiple geographically separated sites, and wherein at least a portion of the plurality of distributed nodes are configured to communicate on a network.
20. The non-transitory computer-readable medium of claim 17 , wherein the third object is associated with one or more rules relating to data retention and replication, the fixed-content storage system being configured to retain and replicate the third data object in accordance with the one or more rules.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/421,042 US20120173596A1 (en) | 2008-02-22 | 2012-03-15 | Relational objects for the optimized management of fixed-content storage systems |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/036,162 US7899850B2 (en) | 2008-02-22 | 2008-02-22 | Relational objects for the optimized management of fixed-content storage systems |
US13/014,659 US8171065B2 (en) | 2008-02-22 | 2011-01-26 | Relational objects for the optimized management of fixed-content storage systems |
US13/421,042 US20120173596A1 (en) | 2008-02-22 | 2012-03-15 | Relational objects for the optimized management of fixed-content storage systems |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/014,659 Continuation US8171065B2 (en) | 2008-02-22 | 2011-01-26 | Relational objects for the optimized management of fixed-content storage systems |
Publications (1)
Publication Number | Publication Date |
---|---|
US20120173596A1 true US20120173596A1 (en) | 2012-07-05 |
Family
ID=40999341
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/036,162 Expired - Fee Related US7899850B2 (en) | 2008-02-22 | 2008-02-22 | Relational objects for the optimized management of fixed-content storage systems |
US13/014,659 Active US8171065B2 (en) | 2008-02-22 | 2011-01-26 | Relational objects for the optimized management of fixed-content storage systems |
US13/421,042 Abandoned US20120173596A1 (en) | 2008-02-22 | 2012-03-15 | Relational objects for the optimized management of fixed-content storage systems |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/036,162 Expired - Fee Related US7899850B2 (en) | 2008-02-22 | 2008-02-22 | Relational objects for the optimized management of fixed-content storage systems |
US13/014,659 Active US8171065B2 (en) | 2008-02-22 | 2011-01-26 | Relational objects for the optimized management of fixed-content storage systems |
Country Status (1)
Country | Link |
---|---|
US (3) | US7899850B2 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014133497A1 (en) * | 2013-02-27 | 2014-09-04 | Hitachi Data Systems Corporation | Decoupled content and metadata in a distributed object storage ecosystem |
WO2015157776A1 (en) * | 2014-04-11 | 2015-10-15 | Graham Bromley | Network-attached storage enhancement appliance |
US10528262B1 (en) * | 2012-07-26 | 2020-01-07 | EMC IP Holding Company LLC | Replication-based federation of scalable data across multiple sites |
Families Citing this family (40)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7899850B2 (en) | 2008-02-22 | 2011-03-01 | Bycast, Inc. | Relational objects for the optimized management of fixed-content storage systems |
US7979649B1 (en) * | 2008-06-09 | 2011-07-12 | Symantec Corporation | Method and apparatus for implementing a storage lifecycle policy of a snapshot image |
US7987325B1 (en) * | 2008-06-09 | 2011-07-26 | Symantec Operation | Method and apparatus for implementing a storage lifecycle based on a hierarchy of storage destinations |
US20090319567A1 (en) * | 2008-06-24 | 2009-12-24 | Apple Inc. | System and method of data management using a structure to propagate changes to referenced objects |
SE533007C2 (en) | 2008-10-24 | 2010-06-08 | Ilt Productions Ab | Distributed data storage |
US8898267B2 (en) | 2009-01-19 | 2014-11-25 | Netapp, Inc. | Modifying information lifecycle management rules in a distributed system |
US8768971B2 (en) * | 2009-03-12 | 2014-07-01 | Microsoft Corporation | Distributed data storage |
US8261033B1 (en) | 2009-06-04 | 2012-09-04 | Bycast Inc. | Time optimized secure traceable migration of massive quantities of data in a distributed storage system |
US8407517B2 (en) | 2010-04-08 | 2013-03-26 | Hitachi, Ltd. | Methods and apparatus for managing error codes for storage systems coupled with external storage systems |
EP2387200B1 (en) | 2010-04-23 | 2014-02-12 | Compuverde AB | Distributed data storage |
US9449007B1 (en) * | 2010-06-29 | 2016-09-20 | Emc Corporation | Controlling access to XAM metadata |
US20120078931A1 (en) | 2010-09-29 | 2012-03-29 | International Business Machines Corporation | Methods for managing ownership of redundant data and systems thereof |
US8645636B2 (en) | 2010-09-29 | 2014-02-04 | International Business Machines Corporation | Methods for managing ownership of redundant data and systems thereof |
US8539165B2 (en) | 2010-09-29 | 2013-09-17 | International Business Machines Corporation | Methods for managing ownership of redundant data and systems thereof |
US8612682B2 (en) | 2010-09-29 | 2013-12-17 | International Business Machines Corporation | Methods for managing ownership of redundant data and systems thereof |
US8539154B2 (en) | 2010-09-29 | 2013-09-17 | International Business Machines Corporation | Methods for managing ownership of redundant data and systems thereof |
US8706697B2 (en) * | 2010-12-17 | 2014-04-22 | Microsoft Corporation | Data retention component and framework |
US8745095B2 (en) * | 2011-08-12 | 2014-06-03 | Nexenta Systems, Inc. | Systems and methods for scalable object storage |
US8997124B2 (en) | 2011-09-02 | 2015-03-31 | Compuverde Ab | Method for updating data in a distributed data storage system |
US8769138B2 (en) | 2011-09-02 | 2014-07-01 | Compuverde Ab | Method for data retrieval from a distributed data storage system |
US8645978B2 (en) | 2011-09-02 | 2014-02-04 | Compuverde Ab | Method for data maintenance |
US9021053B2 (en) | 2011-09-02 | 2015-04-28 | Compuverde Ab | Method and device for writing data to a data storage system comprising a plurality of data storage nodes |
US9626378B2 (en) | 2011-09-02 | 2017-04-18 | Compuverde Ab | Method for handling requests in a storage system and a storage node for a storage system |
US8468138B1 (en) | 2011-12-02 | 2013-06-18 | International Business Machines Corporation | Managing redundant immutable files using deduplication in storage clouds |
US9411931B2 (en) * | 2012-01-20 | 2016-08-09 | Mckesson Financial Holdings | Method, apparatus and computer program product for receiving digital data files |
US9355120B1 (en) | 2012-03-02 | 2016-05-31 | Netapp, Inc. | Systems and methods for managing files in a content storage system |
US9658983B1 (en) * | 2012-12-14 | 2017-05-23 | Amazon Technologies, Inc. | Lifecycle support for storage objects having multiple durability levels specifying different numbers of versions |
US9304815B1 (en) | 2013-06-13 | 2016-04-05 | Amazon Technologies, Inc. | Dynamic replica failure detection and healing |
US9747166B2 (en) * | 2013-10-10 | 2017-08-29 | Adobe Systems Incorporated | Self healing cluster of a content management system |
US9558208B1 (en) * | 2013-12-19 | 2017-01-31 | EMC IP Holding Company LLC | Cluster file system comprising virtual file system having corresponding metadata server |
CA2995777A1 (en) * | 2015-08-19 | 2017-02-23 | Oleg GARIPOV | Integrated software development environments, systems, methods, and memory models |
WO2018075042A1 (en) * | 2016-10-20 | 2018-04-26 | Hitachi, Ltd. | Data storage system, process, and computer program for de-duplication of distributed data in a scalable cluster system |
US10956393B2 (en) * | 2016-10-20 | 2021-03-23 | Hitachi, Ltd. | Data storage system and process for providing distributed storage in a scalable cluster system and computer program for such data storage system |
US10719481B1 (en) * | 2016-12-30 | 2020-07-21 | EMC IP Holding Company LLC | Modification of historical data of file system in data storage environment |
US10970302B2 (en) | 2017-06-22 | 2021-04-06 | Adobe Inc. | Component-based synchronization of digital assets |
US11635908B2 (en) * | 2017-06-22 | 2023-04-25 | Adobe Inc. | Managing digital assets stored as components and packaged files |
US11500836B2 (en) * | 2017-06-27 | 2022-11-15 | Salesforce, Inc. | Systems and methods of creation and deletion of tenants within a database |
US10417215B2 (en) | 2017-09-29 | 2019-09-17 | Hewlett Packard Enterprise Development Lp | Data storage over immutable and mutable data stages |
US11811877B2 (en) | 2021-05-13 | 2023-11-07 | Agora Lab, Inc. | Universal transport framework for heterogeneous data streams |
US12206737B2 (en) * | 2021-05-13 | 2025-01-21 | Agora Lab, Inc. | Universal transport framework for heterogeneous data streams |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6317816B1 (en) * | 1999-01-29 | 2001-11-13 | International Business Machines Corporation | Multiprocessor scaleable system and method for allocating memory from a heap |
US20020049749A1 (en) * | 2000-01-14 | 2002-04-25 | Chris Helgeson | Method and apparatus for a business applications server management system platform |
US20020073080A1 (en) * | 2000-01-14 | 2002-06-13 | Lipkin Daniel S. | Method and apparatus for an information server |
US20020133491A1 (en) * | 2000-10-26 | 2002-09-19 | Prismedia Networks, Inc. | Method and system for managing distributed content and related metadata |
US20030187860A1 (en) * | 2002-03-29 | 2003-10-02 | Panasas, Inc. | Using whole-file and dual-mode locks to reduce locking traffic in data storage systems |
US6742137B1 (en) * | 1999-08-17 | 2004-05-25 | Adaptec, Inc. | Object oriented fault tolerance |
US20080016131A1 (en) * | 2003-08-05 | 2008-01-17 | Miklos Sandorfi | Emulated storage system |
US20080235247A1 (en) * | 2007-03-20 | 2008-09-25 | At&T Knowledge Ventures, Lp | System and method of adding data objects to a multimedia timeline |
US20090106256A1 (en) * | 2007-10-19 | 2009-04-23 | Kubisys Inc. | Virtual computing environments |
US7712127B1 (en) * | 2006-11-17 | 2010-05-04 | Network Appliance, Inc. | Method and system of access control based on a constraint controlling role assumption |
Family Cites Families (70)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3721757A (en) | 1971-02-08 | 1973-03-20 | Columbia Broadcasting Syst Inc | Method and apparatus for automatically editing television information |
NL8103895A (en) | 1981-08-21 | 1983-03-16 | Philips Nv | DEVICE FOR MANAGING AN INTERMEMORY MEMORY WITH A MASS TRANSPORT BETWEEN A SOURCE DEVICE AND A DESTINATION DEVICE. |
US5475706A (en) | 1992-03-16 | 1995-12-12 | Nec Corporation | Bulk data transmission system |
US5428769A (en) | 1992-03-31 | 1995-06-27 | The Dow Chemical Company | Process control interface system having triply redundant remote field units |
US5504883A (en) | 1993-02-01 | 1996-04-02 | Lsc, Inc. | Method and apparatus for insuring recovery of file control information for secondary storage systems |
DE4497149B4 (en) | 1993-09-24 | 2005-02-10 | Oracle Corp., Redwood City | Computer-based method for data replication in peer-to-peer environment |
US5522077A (en) | 1994-05-19 | 1996-05-28 | Ontos, Inc. | Object oriented network system for allocating ranges of globally unique object identifiers from a server process to client processes which release unused identifiers |
JP2927325B2 (en) | 1994-06-29 | 1999-07-28 | 富士ゼロックス株式会社 | Data management system |
US5634052A (en) | 1994-10-24 | 1997-05-27 | International Business Machines Corporation | System for reducing storage requirements and transmission loads in a backup subsystem in client-server environment by transmitting only delta files from client to server |
JPH08242286A (en) | 1995-03-03 | 1996-09-17 | Fujitsu Ltd | Communication network management control method |
US5778395A (en) | 1995-10-23 | 1998-07-07 | Stac, Inc. | System for backing up files from disk volumes on multiple nodes of a computer network |
US5890156A (en) | 1996-05-02 | 1999-03-30 | Alcatel Usa, Inc. | Distributed redundant database |
US6356563B1 (en) | 1998-12-29 | 2002-03-12 | At&T Corp. | Global addressing and identifier assignment in inter-worked networks |
US20030040854A1 (en) | 1998-12-31 | 2003-02-27 | Rendahl Craig S. | Data processing and validation |
US6567818B1 (en) | 1999-06-14 | 2003-05-20 | International Business Machines Corporation | Employing management policies to manage instances of objects |
US6976165B1 (en) | 1999-09-07 | 2005-12-13 | Emc Corporation | System and method for secure storage, transfer and retrieval of content addressable information |
US7028071B1 (en) * | 2000-01-28 | 2006-04-11 | Bycast Inc. | Content distribution system for generating content streams to suit different users and facilitating e-commerce transactions using broadcast content metadata |
US20040158676A1 (en) | 2001-01-03 | 2004-08-12 | Yehoshaphat Kasmirsky | Content-based storage management |
US7403901B1 (en) | 2000-04-13 | 2008-07-22 | Accenture Llp | Error and load summary reporting in a health care solution environment |
CA2416783A1 (en) | 2000-07-25 | 2002-01-31 | Acuo Technologies, Llc | Routing medical images within a computer network |
US6735220B1 (en) | 2000-08-01 | 2004-05-11 | Sun Microsystems, Inc. | Using a centralized server to coordinate assignment of identifiers in a distributed system |
US6775668B1 (en) | 2000-09-11 | 2004-08-10 | Novell, Inc. | Method and system for enhancing quorum based access control to a database |
US6782389B1 (en) | 2000-09-12 | 2004-08-24 | Ibrix, Inc. | Distributing files across multiple, permissibly heterogeneous, storage devices |
US6779082B2 (en) | 2001-02-05 | 2004-08-17 | Ulysses Esd, Inc. | Network-based disk redundancy storage system and method |
JP2002244898A (en) | 2001-02-19 | 2002-08-30 | Hitachi Ltd | Database management program and database system |
US6898589B2 (en) | 2001-03-01 | 2005-05-24 | International Business Machines Corporation | Performance optimizer for the transfer of bulk data between computer systems |
US7216289B2 (en) * | 2001-03-16 | 2007-05-08 | Microsoft Corporation | Method and apparatus for synchronizing multiple versions of digital data |
US7146524B2 (en) | 2001-08-03 | 2006-12-05 | Isilon Systems, Inc. | Systems and methods for providing a distributed file system incorporating a virtual hot spare |
US7171434B2 (en) | 2001-09-07 | 2007-01-30 | Network Appliance, Inc. | Detecting unavailability of primary central processing element, each backup central processing element associated with a group of virtual logic units and quiescing I/O operations of the primary central processing element in a storage virtualization system |
US7000141B1 (en) | 2001-11-14 | 2006-02-14 | Hewlett-Packard Development Company, L.P. | Data placement for fault tolerance |
DE10162991A1 (en) | 2001-12-20 | 2003-07-17 | Siemens Ag | Process for computer-aided encryption and decryption of data |
US20030204420A1 (en) | 2002-04-30 | 2003-10-30 | Wilkes Gordon J. | Healthcare database management offline backup and synchronization system and method |
GB0202600D0 (en) | 2002-02-05 | 2002-03-20 | Ibm | Consolidation of replicated data |
US7020665B2 (en) | 2002-03-07 | 2006-03-28 | Microsoft Corporation | File availability in distributed file storage systems |
US7127475B2 (en) | 2002-08-15 | 2006-10-24 | Sap Aktiengesellschaft | Managing data integrity |
US7567993B2 (en) | 2002-12-09 | 2009-07-28 | Netapp, Inc. | Method and system for creating and using removable disk based copies of backup data |
US7376764B1 (en) | 2002-12-10 | 2008-05-20 | Emc Corporation | Method and apparatus for migrating data in a computer system |
US7624158B2 (en) | 2003-01-14 | 2009-11-24 | Eycast Inc. | Method and apparatus for transmission and storage of digital medical data |
US8671132B2 (en) | 2003-03-14 | 2014-03-11 | International Business Machines Corporation | System, method, and apparatus for policy-based data management |
US7761421B2 (en) | 2003-05-16 | 2010-07-20 | Hewlett-Packard Development Company, L.P. | Read, write, and recovery operations for replicated data |
US20040243997A1 (en) | 2003-05-29 | 2004-12-02 | Sun Microsystems, Inc. | Method, system, and program for installing program components on a computer |
US20050021566A1 (en) | 2003-05-30 | 2005-01-27 | Arkivio, Inc. | Techniques for facilitating backup and restore of migrated files |
US7143251B1 (en) | 2003-06-30 | 2006-11-28 | Data Domain, Inc. | Data storage using identifiers |
US20050010529A1 (en) | 2003-07-08 | 2005-01-13 | Zalewski Stephen H. | Method and apparatus for building a complete data protection scheme |
US7027463B2 (en) | 2003-07-11 | 2006-04-11 | Sonolink Communications Systems, Llc | System and method for multi-tiered rule filtering |
US7155466B2 (en) | 2003-10-27 | 2006-12-26 | Archivas, Inc. | Policy-based management of a redundant array of independent nodes |
WO2005078606A2 (en) | 2004-02-11 | 2005-08-25 | Storage Technology Corporation | Clustered hierarchical file services |
US20050216428A1 (en) | 2004-03-24 | 2005-09-29 | Hitachi, Ltd. | Distributed data management system |
US7213022B2 (en) | 2004-04-29 | 2007-05-01 | Filenet Corporation | Enterprise content management network-attached system |
US7343459B2 (en) | 2004-04-30 | 2008-03-11 | Commvault Systems, Inc. | Systems and methods for detecting & mitigating storage risks |
US7392261B2 (en) | 2004-05-20 | 2008-06-24 | International Business Machines Corporation | Method, system, and program for maintaining a namespace of filesets accessible to clients over a network |
US7627726B2 (en) | 2004-06-30 | 2009-12-01 | Emc Corporation | Systems and methods for managing content having a retention period on a content addressable storage system |
US8229904B2 (en) | 2004-07-01 | 2012-07-24 | Emc Corporation | Storage pools for information management |
US7441096B2 (en) | 2004-07-07 | 2008-10-21 | Hitachi, Ltd. | Hierarchical storage management system |
US7657581B2 (en) | 2004-07-29 | 2010-02-02 | Archivas, Inc. | Metadata management for fixed content distributed data storage |
JP4498867B2 (en) | 2004-09-16 | 2010-07-07 | 株式会社日立製作所 | Data storage management method and data life cycle management system |
US20060080362A1 (en) | 2004-10-12 | 2006-04-13 | Lefthand Networks, Inc. | Data Synchronization Over a Computer Network |
US7343467B2 (en) | 2004-12-20 | 2008-03-11 | Emc Corporation | Method to perform parallel data migration in a clustered storage environment |
US7904570B1 (en) | 2004-12-21 | 2011-03-08 | Adobe Systems Incorporated | Configurable file placement |
JP4704161B2 (en) | 2005-09-13 | 2011-06-15 | 株式会社日立製作所 | How to build a file system |
US7577724B1 (en) | 2006-03-28 | 2009-08-18 | Emc Corporation | Methods and apparatus associated with advisory generation |
US20070294310A1 (en) | 2006-06-06 | 2007-12-20 | Hitachi, Ltd. | Method and apparatus for storing and recovering fixed content |
US7546486B2 (en) | 2006-08-28 | 2009-06-09 | Bycast Inc. | Scalable distributed object management in a distributed fixed content storage system |
US7590672B2 (en) | 2006-12-11 | 2009-09-15 | Bycast Inc. | Identification of fixed content objects in a distributed fixed content storage system |
US7885936B2 (en) | 2006-12-29 | 2011-02-08 | Echostar Technologies L.L.C. | Digital file management system |
JP2009026255A (en) | 2007-07-24 | 2009-02-05 | Hitachi Ltd | Data migration method, data migration system, and data migration program |
US8438136B2 (en) | 2007-09-27 | 2013-05-07 | Symantec Corporation | Backup catalog recovery from replicated data |
US7899850B2 (en) | 2008-02-22 | 2011-03-01 | Bycast, Inc. | Relational objects for the optimized management of fixed-content storage systems |
US20090240713A1 (en) | 2008-03-24 | 2009-09-24 | Fenghua Jia | System and Method for Validating Enterprise Information Handling System Network Solutions |
US8898267B2 (en) | 2009-01-19 | 2014-11-25 | Netapp, Inc. | Modifying information lifecycle management rules in a distributed system |
-
2008
- 2008-02-22 US US12/036,162 patent/US7899850B2/en not_active Expired - Fee Related
-
2011
- 2011-01-26 US US13/014,659 patent/US8171065B2/en active Active
-
2012
- 2012-03-15 US US13/421,042 patent/US20120173596A1/en not_active Abandoned
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6317816B1 (en) * | 1999-01-29 | 2001-11-13 | International Business Machines Corporation | Multiprocessor scaleable system and method for allocating memory from a heap |
US6742137B1 (en) * | 1999-08-17 | 2004-05-25 | Adaptec, Inc. | Object oriented fault tolerance |
US20020049749A1 (en) * | 2000-01-14 | 2002-04-25 | Chris Helgeson | Method and apparatus for a business applications server management system platform |
US20020073080A1 (en) * | 2000-01-14 | 2002-06-13 | Lipkin Daniel S. | Method and apparatus for an information server |
US20020133491A1 (en) * | 2000-10-26 | 2002-09-19 | Prismedia Networks, Inc. | Method and system for managing distributed content and related metadata |
US20030187860A1 (en) * | 2002-03-29 | 2003-10-02 | Panasas, Inc. | Using whole-file and dual-mode locks to reduce locking traffic in data storage systems |
US20080016131A1 (en) * | 2003-08-05 | 2008-01-17 | Miklos Sandorfi | Emulated storage system |
US7712127B1 (en) * | 2006-11-17 | 2010-05-04 | Network Appliance, Inc. | Method and system of access control based on a constraint controlling role assumption |
US20080235247A1 (en) * | 2007-03-20 | 2008-09-25 | At&T Knowledge Ventures, Lp | System and method of adding data objects to a multimedia timeline |
US20090106256A1 (en) * | 2007-10-19 | 2009-04-23 | Kubisys Inc. | Virtual computing environments |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10528262B1 (en) * | 2012-07-26 | 2020-01-07 | EMC IP Holding Company LLC | Replication-based federation of scalable data across multiple sites |
WO2014133497A1 (en) * | 2013-02-27 | 2014-09-04 | Hitachi Data Systems Corporation | Decoupled content and metadata in a distributed object storage ecosystem |
CN104813321A (en) * | 2013-02-27 | 2015-07-29 | 日立数据系统有限公司 | Decoupled content and metadata in a distributed object storage ecosystem |
US10671635B2 (en) | 2013-02-27 | 2020-06-02 | Hitachi Vantara Llc | Decoupled content and metadata in a distributed object storage ecosystem |
WO2015157776A1 (en) * | 2014-04-11 | 2015-10-15 | Graham Bromley | Network-attached storage enhancement appliance |
US9875029B2 (en) | 2014-04-11 | 2018-01-23 | Parsec Labs, Llc | Network-attached storage enhancement appliance |
Also Published As
Publication number | Publication date |
---|---|
US8171065B2 (en) | 2012-05-01 |
US20110125814A1 (en) | 2011-05-26 |
US20090216796A1 (en) | 2009-08-27 |
US7899850B2 (en) | 2011-03-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8171065B2 (en) | Relational objects for the optimized management of fixed-content storage systems | |
US7546486B2 (en) | Scalable distributed object management in a distributed fixed content storage system | |
US7590672B2 (en) | Identification of fixed content objects in a distributed fixed content storage system | |
US10764045B2 (en) | Encrypting object index in a distributed storage environment | |
JP6479020B2 (en) | Hierarchical chunking of objects in a distributed storage system | |
US10387673B2 (en) | Fully managed account level blob data encryption in a distributed storage environment | |
US10659225B2 (en) | Encrypting existing live unencrypted data using age-based garbage collection | |
US8261033B1 (en) | Time optimized secure traceable migration of massive quantities of data in a distributed storage system | |
US11755415B2 (en) | Variable data replication for storage implementing data backup | |
US8548957B2 (en) | Method and system for recovering missing information at a computing device using a distributed virtual file system | |
JP5918244B2 (en) | System and method for integrating query results in a fault tolerant database management system | |
US8990257B2 (en) | Method for handling large object files in an object storage system | |
CN103067461B (en) | A kind of metadata management system of file and metadata management method | |
US10572466B2 (en) | Multiple collections of user-defined metadata for self-describing objects | |
US20190370170A1 (en) | Garbage collection implementing erasure coding | |
KR20100070895A (en) | Metadata server and metadata management method | |
WO2011116087A2 (en) | Highly scalable and distributed data de-duplication | |
US20090276598A1 (en) | Method and system for capacity-balancing cells of a storage system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |