+

GB2373075A - Multi-layer logical volume creation and management - Google Patents

Multi-layer logical volume creation and management Download PDF

Info

Publication number
GB2373075A
GB2373075A GB0110341A GB0110341A GB2373075A GB 2373075 A GB2373075 A GB 2373075A GB 0110341 A GB0110341 A GB 0110341A GB 0110341 A GB0110341 A GB 0110341A GB 2373075 A GB2373075 A GB 2373075A
Authority
GB
United Kingdom
Prior art keywords
logical volume
computer
partitions
media
partition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB0110341A
Other versions
GB0110341D0 (en
Inventor
Benedict Michael Rafanello
Mark A Peloquin
Cuong Huu Tran
Cristi Nesbit Ullmann
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of GB0110341D0 publication Critical patent/GB0110341D0/en
Publication of GB2373075A publication Critical patent/GB2373075A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • G06F3/0605Improving or facilitating administration, e.g. storage management by facilitating the interaction with a user or administrator
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management Or Editing Of Information On Record Carriers (AREA)

Abstract

A system and method for a multi-layer logical volume management ("LVW") system which extends the single-layer logical volume model of current technology to handle multiple levels of aggregation. Multiple levels of aggregation allows multiple types of aggregators, such as drive linking, mirroring, and software RAID, to be used together to bypass the limitations inherent in each individual aggregation technology. An LVM data area which stores information about the organization of the multi-layer logical volume is stored within the last partition of the logical volume. A broadcast method is used to locate the LVM data area, and when coupled with a fake extended boot record, allows each level of aggregation to appear to be a single partition to the next higher level aggregator. Thus, an infinite number of aggregation layers may be implemented in one logical volume.

Description

SYSTEM AND METHOD FOR MULTI-LAYER LOGICAL VOLUME CREATION AND MANAGEMENT
This invention relates to the arts of computer disk media, formatting of computer disks, organization of computer readable media by operating 5 systems and device drivers, and the management of logical volumes of computer disks.
Persistent and mass data storage devices for computer systems, especially those employed in personal computers, are well known within the 10 art. Many are disk-based, such as floppy disks, removable hard disk drives ("HAD"), and compact-disk read only memories ("CD-ROM"). FIGURE 1 shows a typical personal computer system (1) architecture, wherein a CPU (2) interfaces to a variety of I/O devices such as a keyboard (3), monitor or display (5) and a mouse (4). The CPU (2) also may interface to a number of 15 storage peripherals including CD-ROM drives (7), hard disk drives (6), and floppy drives (5). Typically, floppy disk drives interface to the CPU via Integrated Drive Electronics ("IDE'') (8), but this interface may alternately be one of several other standard interfaces or a proprietary interface. The hard disk drives (6) and CD-ROM drives (7) may interface to 20 the CPU (2) via an IDE or Small Computer System Interface ("SCSI"), as shown (9).
FIGURE 2 shows a generalization of the hardware, firmware and software organization of a personal computer system (20). The hardware 25 group (21) includes the persistent storage devices discussed sierra, as well as other system hardware components such as a real-time clock, keyboard controller, display adapter, etc. A basic input/output system ("BIOS") (22) provides the direct firmware control of these system components typically. An operating system (24) such as the IBM OS/2 operating system 30 provides high level management of the system resources, including the multi-tasking or multi-threaded scheduling and prioritization of the system application programs (25). Drivers (23) provide specific highlevel interface and control functions for specific hardware, such as a manufacturer and model-specific MAN interface card driver or CDRewritable 35 ("CD-RW") driver. This generalized view of the system also applies to systems on alternate, non-IBM-compatible platforms, such as workstations, which employ a variety of operating systems such as Microsoft Windows, UNIX or LINUX. This general organization of computer system resources and software functionality is well understood in the art.
Turning to FIGURE 3, disk-based mass storage devices such as hard disk drives, floppy disks and CD-ROMS are based physically on a rotating storage platter (30). This platter may be made of flexible mylar, such as floppy disks, or more rigid platters made of aluminum, glass or plastic, 5 such as hard disk drives and CD-ROMS. For magnetic media, one or both sides of the platter are coated with a magnetic layer capable of recording magnetic pulses from a read/write head. For optical media, data recording is made using changes in reflectivity of a band of light, which is then read by a laser-based head. Writable and Re-writable CD-ROM drives combine 10 the technologies of magnetic disks and optical disks. In general, though, the organization of data on the disk is similar. The disk surfaces are divided into multiple concentric rings, or tracks (31). Some disk drives, such as hard disk drives, consist of multiple platters, in which case corresponding tracks on each platter are grouped into cylinders. Each 15 track is divided into multiple sectors (32) in which data can be stored.
Turning to FIGURE 4, we see a computer disk drive (41) represented as an ordered collection of sectors numbered 0 through "n". The very first sector on the hard drive, sector zero, contains the Master Boot Record 20 ("MsR"). The MBR contains partition definitions for the rest of the disk.
TABLE 1 shows a sample partial MBR.
TABLE 1. Partition Table for 6 GB Drive Partition Start (cyl, side, sector) End (cyl, side, sector) Length (sectors) First 0,1,1 391,254, 63 6297417 Second 392,0,1 783,254, 63 6297480 For the disk partitioning shown in TABLE 1, the MBR is located in the first sector on side 0 at cylinder 0 sector 1. The MBR requires only one sector, but the entire track of 63 sectors is "blocked" for the use of the 35 MBR, 62 sectors of side 0 cylinder 0 are left unused.
The partition table has entries in it defining two types of partitions: primary and extended. Conventional disk formatting schemes allow only one extended partition (411) to be defined. Pi (43) and P2 (44) 40 are primary partitions. The order and locations of the primary and
extended partitions may vary, but invariably there are entries in the partition table of the MBR which defines them.
The extended partition (411) is defined in the partition table in the 5 MBR as a single partition using a single entry in the MBR partition table.
Basically, this entry in the MBR just indicates to the computer operating system that inside of this extended partition can be found other partitions and partition definitions. The operating system typically assigns logical drive letters and/or logical volumes to these partitions, or groups of 10 partitions.
In order to determine the size and location of the partitions within the extended partition, the operating system accesses the first sector of the extended partition which typically contains another boot record, known 15 as an Extended Boot Record ("EBR"). The format of the EBR is similar to that of the MBR, and is also well known in the art.
FIGURE 4 shows a first EBR (45), a second EBR (47), and a third EBR (49) within the extended partition (411). In practice, there may be fewer 20 or more EBR's within an extended partition.
Each EBR contains a partition table similar to a MBR partition table.
Conventionally for computer drives commonly used in personal computers and workstations, only two entries may be in use in each EBR. One entry will 25 define a logical partition, and the second entry acts as a link, or pointer, to the next EBR. FIGURE 4 shows a pointer (412) from the second entry of the first EBR (45) to the beginning of the second EBR (47) , and a similar pointer (413) from the second entry of the second EBR (47) to the beginning of the third EBR (413). The last EBR in the extended partition 30 does not contain a pointer to a subsequent EBR, which indicates to the operating system that it is the last EBR in the extended partition. In this manner, the operating system can find and locate the definitions for an unlimited number of partitions or logical drives within the extended partition on a deterministic basis.
In each partition table entry, whether it be an EBR or an MBR, there are certain fields which indicate to the operating system the format, or
file system, employed on the disk. For example, for DOS ("disk operating systems) systems, the field may indicate that the file system is File
40 Allocation Table ("FAT") formatted. Or, for systems which are running IBM's OS/2 operating system, the entry may indicate that the file system is
High Performance File System ("HPFS'') formatted. There are a number of well-known file system formats in the industry, usually associated with the common operating systems for computers such as Microsoft's Windows, IBM,s OS/2 and AIX, variants of UNIX, and LINUX. Using this field, the operating
5 system may determine how to find and access data files stored within the partitions of the primary and extended partitions on the computer disk.
The file system format indicator is sometimes called the "system indicator''. 10 IBM's OS/2 operating system includes a function referred to as the Logical Volume Manager, or "LVMi'. For systems without an LVM, each of the partitions that is usable by the operating system is assigned a drive letter, such as "C:" or "F:", producing a correlating drive letter for each partition on a disk in the computer system. The process which assigns 15 these letters is commonly known. For systems with an LVM, a drive letter may be mapped instead to a logical volume which may contain one or more partitions. The process by which partitions are combined into a single entity is known generically as "aggregation." Given the highly modular design of the OS/2 LVM, the functionality which performs aggregation is 20 contained completely within a single module of the LVM program. LVM calls any module which performs aggregation an "aggregator".
There are various forms of aggregation, such as drive linking, mirroring, and software Redundant Array of Independent Disks ("RAID"). The 25 OS/2 LVM allows a single level of aggregation through the use of drive linking. Internally, the OS/2 LVM uses a layered model. Each feature offered by the LVM for use on a volume is a layer in the LVM. The input to a layer has the same form and structure as the output from a layer. The layers being used on a volume form a stack, and I/O requests are processed 30 from the top most layer down the stack to the bottom most layer.
Currently, the bottom most layer is a special layer called the pass through layer. The top most layer is always the aggregator, which, in the current implementation, is always the drive linking layer. All of the layers in the middle of the stack represent non-aggregation features, such as Bad 35 Block Relocation.
FIGURE 9 illustrates the relationship of the layered model of the LVM and the aggregation of physical partitions into a logical volume (90). On the left, the "feature stack" is shown, having a pass through" layer (97) 40 at the bottom which interfaces directly to the disk devices or device drivers. Above the "pass through" layer (97) may be a feature (96), such
as Bad Block Relocation ("BBR"). Above the feature may be a layer of aggregation, such as drive linking (95). From the view of the feature stack model, an I/O request (98) is received at the top of the stack and propagated downwards to the pass through layer. Comparing that to a tree 5 model of a logical volume (90), the aggregator Al (91) corresponds to the a aggregation layer (95), the feature layer (96) corresponds to the three interfaces between the aggregator Al (91) and it's partition definitions Pi, P2, and P3 (92, 93, and 94 respectively), and the pass through layer (97) corresponds to the interfaces between the partition definitions and 10 and the actual devices or device drivers. These types of LVM structures, feature stack models, and tree models are well understood in the art, and the models can be equally well applied to logical volume management systems in other operating systems such as Hewlett Packard's HP-UX and IBM's AIX.
15 Partitions which are part of a logical volume have a special filesystem format indicator. This indicator does not correspond to any existing filesystem, and it serves to identify the partitions as belonging to a logical volume. The actual filesystem format indicator for a logical volume is stored elsewhere. Furthermore, partitions belonging to a volume 20 have an LVM Data Area at the end of each partition in the volume. The data stored in the LVM Data Area allows the LVM to re-create the volume every time the system is booted. Thus, the LVM allows groupings of partitions to appear to the operating system as a single entity with a single drive letter assignment.
In previous versions of the OS/2 operating system, a file system utility such as the FORMAT disk utility would access the partition table for the partition that was being formatted through low level Input/Output Control ("IOCTL") functions. The system provides IOCTL's to allow a 30 software application to directly read and write to the computer disk, bypassing the file system, rather than using filed-based operations.
Using the IOCTL functions, an application program can actually access everything from the EBR that defines the partition being processed to the end of the partition itself. This allows disk utilities to find the 35 partition table entry that corresponds to the partition they are processing, and alter it. For example, FORMAT will update the filesystem format indicator in the partition table entry for each partition that it formats successfully. While this method works fine for processing individual partitions, it creates problems when dealing with logical 40 volumes. Logical volumes appear to the system as a single entity, which means that they will look just like a partition to older disk utilities,
which will naturally try to treat them as such. However, since a logical volume may contain more than one partition, there is no EBR or partition table entry which describes it. If the older disk utilities are allowed to access the EBR or partition table entry for one of the partitions contained 5 within the logical volume, the partition described in the partition table entry will not agree with what the disk utility sees as the partition.
Furthermore, if the disk utility alters the partition table entry, such as when FORMAT updates the filesystem format indicator, the resulting partition table entry will not be correct. Thus, older disk utilities must 10 not be allowed to access the EBR or partition table entry for a partition contained within a logical volume, yet they need an EBR and partition table entry in order to function correctly.
In the first version of the OS/2 LVM, this problem was solved by 15 creating a "fake" EBR which contained a "fake" partition table entry that described the entire logical volume as if it were a single partition. This "fake" EBR was stored inside of the logical volume on the first partition in the logical volume. The IOCTL functions were intercepted and any requests for an EBR were redirected to the "fake" EBR. This allowed 20 logical volumes to be treated as partitions by older disk utilities, thereby allowing them to function.
The currently available OS/2 LVM design supports only a single layer of aggregation. This places some limitations on what can be done. For 25 example, if software RAID is used as the aggregator, then there is a limit on the number of partitions that can be aggregated into a single volume.
However, if multiple levels of aggregation are allowed, then drive linking could be used to aggregate several software RAID aggregates into a volume, thereby providing a volume with all the benefits of software RAID without 30 the limitations of software RAID.
Thus, there exists a need in the art for a multi-layer logical volume management system and method which allows for multiple levels of aggregation. Further, there exists a need in the art for a multi-layer 35 logical volume management system and method which is compatible with existing disk utility functions, such as IOCTL functions.
SUMMARY OF THE INVENTION
40 The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular description
of a preferred embodiment of the invention, as illustrated in the accompanying drawings wherein like reference numbers represent like parts of the invention.
5 The system and method for a multi-layer logical volume management system provides a method whereby the existing Logical Volume Management model employed by the OS/2 LVM and similar systems can be extended to handle multiple levels of aggregation. Multiple levels of aggregation allows multiple aggregators, such as drive linking, mirroring, and software 10 RAID, to be used together to bypass the limitations inherent in each individual aggregation technology. As an example, many software RAID implementations have a limit on the number of partitions that can be combined into a single entity. However, by using drive linking to combine several software RAID entities into a single volume, the volume can have 15 the benefits of software RAID while employing more partitions than software RAID by itself would allow.
BRIEF DESCRIPTION OF THE DRAWINGS
20 The following detailed description when taken in conjunction with the
figures presented herein present a complete description of the present
invention. FIGURE 1 discloses the fundamental hardware architecture of a 25 computer such as a personal computer.
FIGURE 2 discloses the fundamental software architecture of such a computer. 30 FIGURE 3 illustrates the nature of formatting computer disk surfaces into tracks.
FIGURE 4 shows the organization of the disk sectors into boot records and partitions.
FIGURE 5 shows a single-level aggregation scheme used by a logical volume manager, including the location of a fake extended boot record for the logical volume at the end of the first sector in the group.
40 FIGURE 6 illustrates the expansion of a logical volume
FIGURE 7 sets forth the actual physical storage location of LVM data areas for higher-level aggregations of partitions into logical volumes.
FIGURE 8 shows the relocation method of the LVM data area for the 5 topmost aggregator following expansion of the logical volume.
FIGURE 9 depicts the well-known relationship between a feature stack model of a LVM and a tree model of an LVM.
10 FIGURE 10 shows a feature stack model and tree model of the multilayer LVM, and particularly illustrates the corresponding points of the two models.
FIGURE 11 graphically discloses the broadcast method employed by the 15 multi-layer LVM to locate the LVM data area.
DETAILED DESCRIPTION OF THE INVENTION
The invention is preferably realized using a well-known computing 20 platform, such as an IBM personal computer, running the IBM OS/2 operating system. However, it may be realized in other popular computer system platforms, such as a Sun Microsystems workstation or IBM RS/6000 workstation, running alternate operating systems such as Microsoft Windows, HP-UX, UNIX or LINUX, without departing from the spirit and scope of the 25 invention.
By using OS/2 as the operating system of the preferred embodiment, the existing OS/2 Logical Volume Manager ("LVM") which was previously described in the background of the Invention' can be modified to realize
30 the invention. The existing LVM provides a single-layer of aggregation, called the drive linking layer, and a system for creating and managing logical volumes. It employs a layered model, where each feature or function available for use on a volume is a separate module whose inputs and outputs are the same in form and structure. The features applied to a 3s volume form a stack, with the aggregator (drive linking) being the topmost layer in the stack, and the special pass through layer being the bottom layer of the stack. When a volume is being created (or re-created after a system boot), the feature stack is built from the bottom up, beginning with the pass through layer. Once all of the volumes have been created and are 40 ready for use, the LVM must begin to process I/O requests against the volumes in the system. When an I/O request is processed, it is processed
from the top of the stack, down through the layers (the output of one layer is the input to the next), to the bottom of the feature stack where it is then sent on to the device(s).
5 Creation and Structure of Multi-Laver Logical Volumes When a volume is being created (or re-created after a reboot), a LVM data area at the end of each partition in the volume is consulted. In the LVM data area is a fixed size table listing the features that are in use on 10 the volume, the order in which they appear in the feature stack for the volume, and the location (within the LVM data area) of the data required to initialize each feature. The LVM uses this table to build the feature stack for the volume. As each feature is added to the feature stack, it is initialized using data from the LVM data area of the partition that it 15 will be operating on. The last feature to be added is the aggregator, drive linking. Drive Linking is initialized using data from the LVM data area, and it will produce a single aggregate which consists of the partitions that comprise the volume. The aggregate produced by drive linking appears to be a partition, whose size is approximately the sum of 20 the sizes of the partitions in the aggregate, with the exception that it does not have an LVM data area associated with it.
To extend this model to handle multiple levels of aggregation, the invention calls for making aggregates look just like partitions, so that 25 each aggregator thinks it is aggregating partitions. This requires adding an LVM data area to each aggregate, and then treating this LVM data area as if it resided on a partition. Currently, the data used to initialize a feature on a partition is stored in the LVM data area of the partition.
With only one level of aggregation, and the topmost feature in the feature 30 stack being limited to the aggregator, all of the data necessary to initialize all of the features on a volume reside in the LVM data area of each of the partitions in the volume. Now that aggregates will look just like partitions, it becomes possible to have features which operate on aggregates. The data needed to initialize these features on the aggregate 35 would lie in the LVM data area of the aggregate. Since the aggregate now appears just like a partition, the initialization process for a feature is the same whether it is being used on a partition or an aggregate. Since features can be applied to aggregates, and since aggregators are features, you can now have aggregators applied to aggregates created by other 40 aggregators. Thus, the invention allows an unlimited number of levels of aggregation.
For example, the logical volume of FIGURE 5 contains two aggregation layers, thus making it a multi-layer logical volume. The first aggregation layer results in the creation of A1(51) from partitions Pi, L2, and L3, and the creation of A2 (52) from partitions P2, L4, and L5. The second 5 aggregation layer produces the logical volume LV1 (50) by combining Al and A2 into a single entity. These aggregation layers could represent software RAID or drive linking, although other choices are readily available. The existing OS/2 LVM would store the "fake'' EBR in the LVM data area at the end of the first partition in the volume. However, with multiple levels of 10 aggregation, only the bottom most aggregator would know which partition is the first partition in the volume. Since an I/O request to the EBR must be intercepted and redirected to the 'fake" EBR, and since this detection and redirection must start at the topmost aggregator, the question arises as to how to find the "fake" EBR and then how to redirect an I/O request to the 15 "fake" EBR. The invention solves this problem by using a broadcast method, described infra.
FIGURE 7 shows the apparent location of the LVM data area and the real location of the LVM data area for the logical volume example shown in 20 FIGURE 5. The topmost aggregator, LV1 (50), appears to the system software to have an LVM data area located at the end of it (71). However, since LV1 does not physically exist, but rather it is a collection of disk partitions including the sub-aggregation A2 (52), its LVM data area is first mapped (73) into the apparent A2 (52) partition. As described before, the A2 25 partition also has its own LVM data area, which apparently is located at the end of the A2 partition (74). Again, because A2 does not physically exists, both the A2 LVM data area and the mapped LV1 data area are mapped to the next lower level into the last apparent or real partition. In this case, the next lower apparent partition is a real partition L5 (59), so the 30 LV1 LVM data area is mapped and stored (76) in the L5 partition, followed by the mapped and stored (77) A2 LVM data area, and finally followed by the actual L5 LVM data area located at the end of L5 (78). As can been seen from this example, this method allows for an infinite number of layers of aggregation, consecutively mapping all higher-level LVM data areas to lower 35 level apparent partitions until eventually they are mapped and stored in a real partition.
FIGURE 10 shows by way of another example the correlation between the feature stack model and the tree model of a multi-layer logical volume. In 40 this example, the top layer of the feature stack is an encryption function (159), followed by a drive linking aggregation layer (160), followed by a
second layer of aggregation (161) such as RAID, followed by another feature layer (162) such as Bad Block Relocation, and eventually to the pass through layer (163). The dotted lines of FIGURE 10 show correspondence of the layers of the feature stack layers to the nodes and interfaces of the 5 tree model, which in this example includes a toplevel logical volume (150), a top level aggregator (151), a set of second level aggregators (152 and 156), and multiple partitions (153, 154, 157 and 158).
Logical Volume Size Changing -- Expansion and Contraction Particular problems arise when expanding or contracting the size of an existing multi-layer logical volume. For example, the logical volume shown in FIGURE 5 may be expanded to include another sub-aggregation, A3, and an individual partition P4. The example expansion (608) of the logical 15 volume is shown in FIGURE 6. In it's expanded state, the logical volume LV1 is still comprised of multiple aggregates produced by the aggregation layer below, but it now directly incorporates a partition as well. In the example shown, Al (51) still comprises 3 physical disk partitions, Pi, L2 and L3. Similarly, A2 (52) still comprises three disk partitions, P2, L4, 20 and L5, the added A3 (600) comprises two partitions P3 (604) and L6 (608), and the added individual partition P4 (606).
Since the size and conflquration of the volume has changed, the LVM must update the data in the LVM data area of the aggregate that was 25 expanded, as well as adding the LVM data area to each of the new partitions and aggregates created as part of the expansion process. Furthermore, the aggregator performing the expansion by adding the new partitions and/or aggregates to its existing aggregate has a problem. Once the aggregate is expanded, the existing LVM data area on the aggregate will no longer be at 30 the end of the aggregate. Thus, the aggregate will not look like a partition anymore. Since the invention requires that the aggregate look like a partition to all of the features above it in the feature stack for the volume, the aggregator must move the LVM data area to the end of the newly expanded aggregate.
When the logical volume is expanded as shown in FIGURE 6, the LV1 LVM data area is no longer located in the last partition of the logical volume, so it must be relocated in order to allow it to be deterministically found by the LVM. Turning to FIGURE 8, the LV1 LVM data area (71) was originally 40 mapped into A2 and ultimately stored in partition L5 prior to expansion of the logical volume (50). When sub- aggregation A3 (600) and partition P4
(606) are added to the logical volume LV1 (50) to expand the logical volume, the LVl LVM data area is re-mapped and stored (80) to the new last partition of the logical volume, which is now P4, as shown. The actual process of moving or remapping the LVM data area is preferrably done by 5 copying portions of the original LVM data area, and rebuilding it at the end of the expanded logical volume. Once this is done, the "fake" EBR can be updated to reflect the new size of the volume as seen by the filesystems and disk utilities which will be using it. The method can be reversed for shrinking the logical volume.
Broadcast Method of I/O Request Handling The process of the broadcast method of handling I/O requests is shown in FIGURE 11. When an I/O request (169) to the EBR is detected by the 15 multi-level LVM, each aggregator which does not find the "fake" EBR among its children will duplicate the I/O request, flag it as an EBR I/O request, and issue the I/O request to each of its children in parallel (127, 172, 173, 174, 175, and 176). This parallel duplication and issuance of I/O requests may descend multiple levels of aggregation. Of all the parallel 20 requests, only one will succeed and the others will be discarded. When an aggregator finds the"fake" EBR among its children, it will redirect the I/O request to the "fake" EBR, and turn off the EBR I/O request flag. When an I/O request reaches the pass through layer, if the EBR I/O request flag is set, the pass through layer will discard that I/O request. Thus, only 25 one I/O request will succeed in reaching the "fake" EBR, and all of the duplicate I/O requests generated along the way will be discarded. This method is simple to implement, and, since I/O requests to the EBR are rare, it is reasonably efficient. An alternative to issuing the duplicate EBR I/O requests in parallel is to issue them in serial, stopping with the 30 first one to succeed. In this case the pass through layer will fail any I/O request which has the EBR T/O flag set instead of discarding such requests. Summary
Methods and systems to realize a multi-layer logical volume manager for a computer system have been described and set forth in both general terms applicable to concepts and methodologies useful for LVM's of many operating systems, and in particular terms applicable to IBM's OS/2 40 operating system.
It will be understood by those skilled in the relevant arts and from the foregoing description that various modifications and changes may be
made in the preferred embodiment of the present invention without departing from its true spirit and scope, such as the use of alternate computer 5 platforms, operating systems and disk storage means. It is intended that this description is for purposes of illustration only and should not be
construed in a limiting sense. The scope of this invention should be limited only by the language of the following claims.

Claims (12)

1. A method for creating and managing logical volumes of computerreadable media for a computer system, comprising the steps of: providing a plurality of computer-readable media partitions; performing a first aggregation of said plurality of computer-readable media partitions into two or more partition aggregations; and aggregating said partition aggregations into a multi-layer logical volume such that said computerreadable partitions are grouped into a single computer-readable entity for enhanced availability and accessibility to a computer system.
2. A method for creating and managing logical volumes of computerreadable media for a computer system as set forth in Claim 1
wherein said step of performing a first aggregation of computer-readable media partitions comprises the step of providing drive linking between said 20 media partitions.
3. A method for creating and managing logical volumes of computerreadable media for a computer system as set forth in Claim 1 or 2 wherein said step of performing a first aggregation of computer-readable 25 media partitions comprises the step of providing at least one RAID disk array.
4. A method for creating and managing logical volumes of computerreadable media for a computer system as set forth in Claim 1, 2 or 30 3 wherein said step of performing a first aggregation of computer-readable media partitions comprises the step of providing disk mirroring between said media partitions.
5. A method for creating and managing logical volumes of 35 computerreadable media for a computer system as set forth in any one of claims 1 to 4 further comprising the steps of: providing a logical volume data area containing information indicating the organization of the multi- layer logical volume;
providing a link in an extended boot record within one of said computerreadable media partitions, said link indicating the location of said logical volume data area within said logical volume; 5 retrieving said link from said extended boot record; and retrieving said information from said logical volume data area such that the organization of the multilayer logical volume can be determined.
lO
6. A method for creating and managing logical volumes of computerreadable media for a computer system as set forth in Claim 5 wherein said step of retrieving said information from said logical volume data area further comprises broadcasting a request to at least one aggregator.
7. A method for creating and managing logical volumes of computer-readable media for a computer system as set forth in Claim 5 further comprising the step of storing said logical volume data area in the end of the last partition of the logical volume.
8. A system for creating and managing logical volumes of computerreadable media, said system comprising: at least one computer-readable media device having at least one 25 computer-readable media partitions; a computer processor capable of executing computer software, and interfaced to at least one computer readable media device or devices; and 30 a multilayer logical volume manager including two or more layers of computerreadable media partition aggregators such that multiple layers of aggregations of said computer-readable media partitions are combined to be logically accessible as a single entity by said computer processor and software being executed by said computer processor.
9. A system for creating and managing logical volumes of computerreadable media as set forth in Claim 8 wherein said multi-layer logical volume manager aggregators further comprise a Redundant Array of Independent Disks ("RAID") array of computer-readable media.
10. A system for creating and managing logical volumes of computerreadable media as set forth in Claim 8 or 9 wherein said multi-layer logical volume manager aggregators further comprise a disk mirroring subsystem.
11. A system for creating and managing logical volumes of computerreadable media as set forth in Claim 8, 9 or 10 wherein said multi-layer logical volume manager aggregators further comprise a drive linking subsystem.
12. A system for creating and managing logical volumes of computerreadable media as set forth in any one of claims 8 to 11 wherein said multi-layer logical volume manager comprises an enhanced IBM OS/2 LVM.
GB0110341A 2000-04-27 2001-04-26 Multi-layer logical volume creation and management Withdrawn GB2373075A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US56118400A 2000-04-27 2000-04-27

Publications (2)

Publication Number Publication Date
GB0110341D0 GB0110341D0 (en) 2001-06-20
GB2373075A true GB2373075A (en) 2002-09-11

Family

ID=24240973

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0110341A Withdrawn GB2373075A (en) 2000-04-27 2001-04-26 Multi-layer logical volume creation and management

Country Status (3)

Country Link
JP (1) JP2002073393A (en)
KR (1) KR20010098429A (en)
GB (1) GB2373075A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7941695B2 (en) 2003-08-14 2011-05-10 Compellent Technolgoies Virtual disk drive system and method
US8468292B2 (en) 2009-07-13 2013-06-18 Compellent Technologies Solid state drive data storage system and method
EP2754052A4 (en) * 2011-09-11 2015-05-20 Microsoft Technology Licensing Llc ORGANIZATION AND REPRESENTATION OF PARTITION IN POOL
US9146851B2 (en) 2012-03-26 2015-09-29 Compellent Technologies Single-level cell and multi-level cell hybrid solid state drive
US9489150B2 (en) 2003-08-14 2016-11-08 Dell International L.L.C. System and method for transferring data between different raid data storage types for current data and replay data

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4861273B2 (en) * 2000-12-07 2012-01-25 株式会社日立製作所 Computer system
KR100392382B1 (en) * 2001-07-27 2003-07-23 한국전자통신연구원 Method of The Logical Volume Manager supporting Dynamic Online resizing and Software RAID
JP4168277B2 (en) 2004-01-22 2008-10-22 日本電気株式会社 Logical unit expansion device
US7296116B2 (en) 2004-02-12 2007-11-13 International Business Machines Corporation Method and apparatus for providing high density storage
US7296117B2 (en) 2004-02-12 2007-11-13 International Business Machines Corporation Method and apparatus for aggregating storage devices
JP5685210B2 (en) * 2012-01-30 2015-03-18 富士通フロンテック株式会社 Storage system, backup method, and data restoration method
JP2013178832A (en) * 2013-06-04 2013-09-09 I-O Data Device Inc Information processing program and information processing apparatus
CN103761059B (en) * 2014-01-24 2017-02-08 中国科学院信息工程研究所 Multi-disk storage method and system for mass data management
EP4357902A4 (en) 2021-11-30 2024-12-18 Samsung Electronics Co., Ltd. Electronic device for managing storage space, and method for operating electronic device
CN119484293B (en) * 2024-10-25 2025-09-09 西安交通大学 Hierarchical aggregation method for MPI (Multi-processor interface) aggregation I/O (input/output)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0319147A2 (en) * 1987-11-30 1989-06-07 International Business Machines Corporation Method for storing pre-organised groups of related information files in a data processing system
US5758050A (en) * 1996-03-12 1998-05-26 International Business Machines Corporation Reconfigurable data storage system
US5829053A (en) * 1996-05-10 1998-10-27 Apple Computer, Inc. Block storage memory management system and method utilizing independent partition managers and device drivers
US6081879A (en) * 1997-11-04 2000-06-27 Adaptec, Inc. Data processing system and virtual partitioning method for creating logical multi-level units of online storage

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0319147A2 (en) * 1987-11-30 1989-06-07 International Business Machines Corporation Method for storing pre-organised groups of related information files in a data processing system
US5758050A (en) * 1996-03-12 1998-05-26 International Business Machines Corporation Reconfigurable data storage system
US5829053A (en) * 1996-05-10 1998-10-27 Apple Computer, Inc. Block storage memory management system and method utilizing independent partition managers and device drivers
US6081879A (en) * 1997-11-04 2000-06-27 Adaptec, Inc. Data processing system and virtual partitioning method for creating logical multi-level units of online storage

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8560880B2 (en) 2003-08-14 2013-10-15 Compellent Technologies Virtual disk drive system and method
US9021295B2 (en) 2003-08-14 2015-04-28 Compellent Technologies Virtual disk drive system and method
US7962778B2 (en) 2003-08-14 2011-06-14 Compellent Technologies Virtual disk drive system and method
US8020036B2 (en) 2003-08-14 2011-09-13 Compellent Technologies Virtual disk drive system and method
US8321721B2 (en) 2003-08-14 2012-11-27 Compellent Technologies Virtual disk drive system and method
US10067712B2 (en) 2003-08-14 2018-09-04 Dell International L.L.C. Virtual disk drive system and method
US7945810B2 (en) 2003-08-14 2011-05-17 Compellent Technologies Virtual disk drive system and method
US8473776B2 (en) 2003-08-14 2013-06-25 Compellent Technologies Virtual disk drive system and method
US7941695B2 (en) 2003-08-14 2011-05-10 Compellent Technolgoies Virtual disk drive system and method
US8555108B2 (en) 2003-08-14 2013-10-08 Compellent Technologies Virtual disk drive system and method
US9489150B2 (en) 2003-08-14 2016-11-08 Dell International L.L.C. System and method for transferring data between different raid data storage types for current data and replay data
US9047216B2 (en) 2003-08-14 2015-06-02 Compellent Technologies Virtual disk drive system and method
US8468292B2 (en) 2009-07-13 2013-06-18 Compellent Technologies Solid state drive data storage system and method
US9069468B2 (en) 2011-09-11 2015-06-30 Microsoft Technology Licensing, Llc Pooled partition layout and representation
EP2754052A4 (en) * 2011-09-11 2015-05-20 Microsoft Technology Licensing Llc ORGANIZATION AND REPRESENTATION OF PARTITION IN POOL
US9146851B2 (en) 2012-03-26 2015-09-29 Compellent Technologies Single-level cell and multi-level cell hybrid solid state drive

Also Published As

Publication number Publication date
GB0110341D0 (en) 2001-06-20
KR20010098429A (en) 2001-11-08
JP2002073393A (en) 2002-03-12

Similar Documents

Publication Publication Date Title
US5129088A (en) Data processing method to create virtual disks from non-contiguous groups of logically contiguous addressable blocks of direct access storage device
US6119208A (en) MVS device backup system for a data processor using a data storage subsystem snapshot copy capability
US5897661A (en) Logical volume manager and method having enhanced update capability with dynamic allocation of storage and minimal storage of metadata information
EP0976044B1 (en) System for providing write notification during data set copy
US9785370B2 (en) Method and system for automatically preserving persistent storage
KR100392382B1 (en) Method of The Logical Volume Manager supporting Dynamic Online resizing and Software RAID
US7197598B2 (en) Apparatus and method for file level striping
US7870356B1 (en) Creation of snapshot copies using a sparse file for keeping a record of changed blocks
US6973556B2 (en) Data element including metadata that includes data management information for managing the data element
Teigland et al. Volume Managers in Linux.
US6453383B1 (en) Manipulation of computer volume segments
US5416915A (en) Method and system for minimizing seek affinity and enhancing write sensitivity in a DASD array
JP3866038B2 (en) Method and apparatus for identifying changes to logical objects based on changes to logical objects at the physical level
US6108759A (en) Manipulation of partitions holding advanced file systems
US6070254A (en) Advanced method for checking the integrity of node-based file systems
US20060271734A1 (en) Location-independent RAID group virtual block management
GB2373075A (en) Multi-layer logical volume creation and management
US6523047B1 (en) System and method for volume expansion in the presence of multiple plug-in features
US6108749A (en) DASD file copy system for a data processor using a data storage subsystem snapshot copy capability
JP4480479B2 (en) Storage system
EP0319147B1 (en) Method for storing pre-organised groups of related information files in a data processing system
US6636871B1 (en) Control of multiple layer aggregation logical volume management data and boot record
US6711591B1 (en) Top-down control of multiple layer aggregation logical volume management data and boot record
JPH0863394A (en) Storage device system and storage device control method
US7721062B1 (en) Method for detecting leaked buffer writes across file system consistency points

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载