+

WO2018154697A1 - Storage system and recovery control method - Google Patents

Storage system and recovery control method Download PDF

Info

Publication number
WO2018154697A1
WO2018154697A1 PCT/JP2017/007015 JP2017007015W WO2018154697A1 WO 2018154697 A1 WO2018154697 A1 WO 2018154697A1 JP 2017007015 W JP2017007015 W JP 2017007015W WO 2018154697 A1 WO2018154697 A1 WO 2018154697A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
determination
strip
copyback
read
Prior art date
Application number
PCT/JP2017/007015
Other languages
French (fr)
Japanese (ja)
Inventor
聡 上條
Original Assignee
株式会社日立製作所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立製作所 filed Critical 株式会社日立製作所
Priority to PCT/JP2017/007015 priority Critical patent/WO2018154697A1/en
Publication of WO2018154697A1 publication Critical patent/WO2018154697A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers

Definitions

  • the present invention generally relates to storage system recovery control.
  • a storage system having a RAID (Redundant Array of Independent) (or Inexpensive) Disks) group composed of a plurality of disks a storage system provided with a spare disk is known.
  • the data in the failed disk in the RAID group is restored to the spare disk (rebuild process), and the data is copied from the spare disk to the disk after the failed disk is replaced (copy back process).
  • RAID Redundant Array of Independent
  • a storage device having a large storage capacity is adopted as a storage device (for example, a disk) constituting a RAID group.
  • the I / O processing performance (I / O processing performance in accordance with the I / O request) may be lowered.
  • the storage system includes a plurality of storage devices that provide one or more logical RAID groups, and a processor unit that is one or more processors connected to the plurality of storage devices.
  • Each RAID group is composed of two or more stripes.
  • Each stripe is composed of two or more strips.
  • Each of the plurality of storage devices has a plurality of strips and one or more spare areas.
  • For each RAID group at least one of two or more storage devices each providing two or more strips constituting any stripe, and two or more providing each two or more strips constituting any other stripe At least one of the storage devices is a different storage device.
  • the processor unit executes I / O processing according to the I / O request.
  • the processor unit executes a rebuild process for restoring data corresponding to a plurality of strips of a problem storage device among a plurality of storage devices to a plurality of spare areas of two or more storage devices, respectively.
  • the processor unit includes a data copy process for copying data from a spare area corresponding to a strip in which data is not restored among a plurality of strips included in the storage device after replacement of the problem storage device after the rebuild process is completed.
  • Execute copyback processing includes at least one of the following (W) and (R): (W) When a write request with the strip in the storage device as the write destination is received during the copyback process after the replacement, the data according to the write request is written to the write destination strip in the write process according to the write request.
  • the storage device can be restored while reducing the decrease in I / O processing performance.
  • 2 shows an example of the configuration of a computer system related to Example 1.
  • 2 shows an example of a logical configuration and a physical configuration of a pool.
  • 2 shows an example of a program and a table stored in a local memory.
  • An example of a pool state table is shown.
  • An example of a copyback setting management table is shown.
  • An example of a progress management table is shown.
  • An example of a head address management table is shown.
  • the flow of recovery processing is shown.
  • An example of the details of a rebuild process is shown.
  • An example of the details of a copyback process is shown.
  • the detailed flow of a copyback process is shown.
  • the detailed flow of a data copy process is shown.
  • An example of write processing during copyback processing is shown.
  • the “interface unit” includes one or more interfaces.
  • the one or more interfaces may be one or more similar interface devices (for example, one or more NIC (Network Interface Card)) or two or more different interface devices (for example, NIC and HBA (Host Bus Adapter)). There may be.
  • NIC Network Interface Card
  • HBA Home Bus Adapter
  • the “storage unit” includes one or more memories.
  • the at least one memory for the storage unit may be a volatile memory.
  • the storage unit is mainly used during processing by the processor unit.
  • the “processor unit” includes one or more processors.
  • the at least one processor is typically a microprocessor such as a CPU (Central Processing Unit).
  • Each of the one or more processors may be a single core or a multi-core.
  • the processor may include a hardware circuit that performs part or all of the processing.
  • the process may be described using “program” as a subject.
  • the program is executed by a processor unit (for example, CPU (Central Processing Unit)), so that the determined processing is appropriately performed. Therefore, the subject of processing may be the processor unit (or an apparatus or system having the processor unit) because the processing is performed using a storage unit (for example, a memory) and / or an interface device (for example, a communication port).
  • the processor unit may include a hardware circuit that performs part or all of the processing.
  • the program may be installed in a computer-like device from a program source.
  • the program source may be, for example, a recording medium (for example, non-transitory) readable by a program distribution server or a computer.
  • two or more programs may be realized as one program, or one program may be realized as two or more programs.
  • the “host system” may be one or more physical host computers (for example, a cluster of host computers), or at least one virtual host computer (for example, VM (Virtual Machine)). ) May be included.
  • the host system is simply referred to as “host”.
  • the “storage system” may be one or more storage devices.
  • the “storage device” may be any device having a function of storing data in the storage device.
  • the storage device may be a computer (for example, a general-purpose computer) such as a file server.
  • at least one physical storage device may execute a virtual computer (for example, VM (Virtual Machine)), or may execute SDx (Software-Defined anything).
  • SDx for example, SDS (Software Defined Storage) (an example of a virtual storage device) or SDDC (Software-defined Datacenter) can be adopted.
  • at least one storage device (computer) may have a hypervisor.
  • the hypervisor may generate a server VM (Virtual Machine) that operates as a server and a storage VM that operates as a storage.
  • the server VM may operate as a host that issues an I / O request
  • the storage VM may operate as a storage controller that performs I / O to a drive in response to an I / O request from the server VM.
  • a reference code (or a common part in the reference sign) is used, and when explaining the same kind of element separately, the element ID (or Element reference signs) may be used.
  • FIG. 1 shows an example of the configuration of a computer system according to the embodiment.
  • the computer system has a host 101 and a storage system 102.
  • the host 101 and the storage system 102 are connected to each other via a communication network 152.
  • the host 101 transmits an I / O (Input / Output) request to the storage system 102.
  • the I / O request includes I / O destination information indicating the location of the I / O destination.
  • the I / O destination information includes, for example, the LUN (Logical Unit Number) of the LU (Logical Unit) of the I / O destination and the LBA (Logical Unit Block Address) of the area in the LU.
  • An LU is a logical volume (logical storage device) provided from the storage system 110. Based on the I / O destination information, the logical area of the I / O destination is specified, and the drive 124 based on the logical area is specified.
  • the storage system 102 includes a storage controller 103 and a drive box 121.
  • the drive box 121 includes a plurality (or one) of pools 183.
  • Each pool 183 includes a plurality of drives 124.
  • the drive 124 is an example of a storage device (typically a non-volatile storage device), and is, for example, an HDD (Hard Disk Drive) or an SSD (Solid State Drive).
  • the storage controller 103 includes a host I / F 111, a cache memory (CM) 112, a CPU (Central Processing Unit) 113, a drive I / F 114, and a local memory (LM) 115.
  • the host I / F 111 and the drive I / F 114 are examples of the interface unit.
  • the cache memory 112 and the local memory 115 are examples of a storage unit.
  • the CPU 113 is an example of a processor unit.
  • the host I / F 111 is an example of an interface device of the storage controller 103, and communicates with the host 101.
  • the cache memory 112 temporarily stores data (write data) written from the host 101 to the drive 124 and data (read data) read from the drive 124.
  • the CPU 113 executes various processes by executing a program stored in the local memory 115.
  • the CPU 113 is connected to the host I / F 111, the cache memory 112, the drive I / F 114, and the local memory 115.
  • the CPU 113 transmits various commands to the drive 124 of the storage device 121 via the drive I / F 114.
  • the drive I / F 114 is an example of an interface device of the storage controller 103, and communicates with each drive 124.
  • the local memory 115 stores various programs and various information.
  • the storage controller 103 processes the I / O request received from the host 101. Specifically, for example, the storage controller 103 identifies the drive 124 that is the data I / O destination based on the I / O destination of the I / O request, and executes I / O for the identified drive 124. At this time, the storage controller 103 caches the I / O target data in the cache memory 112.
  • FIG. 2 shows an example of the logical configuration and physical configuration of the pool 183.
  • the pool 183 has a plurality of RAID groups (logical RAID groups) 223.
  • a logical volume is provided based on the RAID group 223.
  • Each RAID group 223 has a plurality of drives 224 (logical drives).
  • Each RAID group 223 is a RAID group according to a 2D + 2P RAID configuration and a RAID 6 RAID level. That is, in each RAID group 223, each stripe is composed of four strips (unit storage areas) that the four drives 224 respectively have. In each stripe, two data elements (D) are stored in two strips, respectively, and two parities (P) based on the two data elements are stored in two strips.
  • the plurality of RAID types (RAID level and RAID configuration) respectively corresponding to the plurality of RAID groups 223 may be the same.
  • each pool 183 has a plurality of drive groups 123.
  • the number of drive groups 123 is the same as the number of RAID groups 223, but may be different.
  • the number of drives 124 constituting each drive group 123 is the same as the number of drives 224 constituting the RAID group 223, but may be different.
  • each drive group 123 has four drives 124 (physical drives). Each drive group 123 does not constitute a RAID group by itself.
  • the pool 183 logically configures the plurality of RAID groups 223 described above.
  • each RAID group 223 is composed of two or more stripes.
  • Each stripe is composed of two or more strips.
  • Each drive 124 has a plurality of strips.
  • At least one of the two or more drives 124 that respectively provide two or more strips that constitute the first stripe, and two or more drives that respectively provide two or more strips that constitute the second stripe At least one of 124 is a different drive 124.
  • the first stripe is any stripe
  • the second stripe is any stripe other than the first stripe.
  • a plurality of strips constituting the same stripe are distributed in different drive groups, and a plurality of drive positions (physical addresses in the drives) respectively corresponding to the plurality of strips.
  • the position to follow is different.
  • the same number is given to each of the four strips constituting the same stripe.
  • four strips 0-0 constituting the same stripe are distributed in a plurality of drive groups 123.
  • each drive 124 has one or more spare areas (S) in addition to a plurality of strips.
  • each drive 124 has one spare area.
  • the “spare area” is a spare storage area.
  • any strip is a component of the RAID group 223, but the spare area is not a component of the RAID group 223.
  • data (data element or parity) in the strip in the failed drive is restored to the spare area, as will be described later.
  • the size of each spare area is not less than the strip size.
  • a distributed RAID configuration is adopted, and a spare area is provided in each drive 124 instead of providing a spare drive.
  • the RAID level and the RAID configuration may be different.
  • the RAID level of any pool 183 is RAID 6 and the RAID configuration is 2D + 2P.
  • FIG. 3 shows an example of programs and tables stored in the local memory 115.
  • the local memory 115 stores various programs. Examples of the program include a host I / O processing program 301, a rebuild processing program 302, a copyback processing program 303, and a parity generation program 304.
  • the host I / O processing program 301 processes an I / O request from the host 101.
  • the rebuild process program 302 executes a rebuild process.
  • the copy back processing program 303 executes copy back processing.
  • the parity generation program 304 generates the parity stored in the stripe.
  • the local memory 115 stores various tables. Examples of the table include a pool status table 305, a copyback setting management table 306, a progress management table 307, and a head address management table 308.
  • FIG. 4 shows an example of the pool status table 305 and the status of the pool corresponding to the table 305.
  • the pool state table 305 is information indicating the redundancy and state of the pool 183.
  • the pool state table 305 manages entries that hold information such as the pool number 401, the redundancy 402, and the state 403 for each pool 183.
  • Pool number 401 is a pool number.
  • the redundancy 402 indicates the redundancy of the pool 183.
  • the redundancy as the redundancy 402 is the lowest redundancy in the pool 183.
  • N is an integer of 1 or more
  • Some stripes have a redundancy “1” and some stripes have a redundancy “0”. In this case, the minimum value “0” is registered as the redundancy 402.
  • the state 403 indicates a state related to processing to the pool 183.
  • An example of processing for the pool 183 is copy back processing.
  • copyback is in progress which means that copyback processing is in progress and data copy processing described later is being processed, and data copy processing is stopped There is “stopped” meaning that it is in an inactive state, and “none” meaning that copyback processing is not in progress.
  • the redundancy and status of each of the pools 0 to 2 are as follows. That is, in pool 0, two drives have failed and the redundancy is from 2 to 0. For the pool 0, the data copy process in the copy back process is being performed. In the pool 1, a failure occurs in one drive, and the redundancy is 2 to 1. Although the copy back process is being performed for the pool 1, the data copy process in the process is in a stopped state. In the pool 2, no failure has occurred in any drive.
  • FIG. 5 shows an example of the copyback setting management table 306.
  • the copyback setting management table 306 is a table for managing thresholds for determining whether or not to execute data copy processing.
  • the copyback setting management table 306 manages entries that hold information such as a pool number 501, an I / O waiting time 502, a write rate 503, a CPU usage rate 504, and a determination waiting time 505 for each pool 183. .
  • Pool number 501 is the number of pool 183.
  • the I / O waiting time 502 indicates a time during which an I / O request (hereinafter referred to as host I / O) has not arrived from the host 101 for the pool 183.
  • the write ratio 503 indicates the ratio of writes in the host I / O for the pool 183.
  • the CPU usage rate 504 indicates the usage rate of the CPU 113 with respect to processing for the pool 183.
  • the determination waiting time 505 indicates a waiting time from the start or stop of the data copy process to the determination.
  • the information 502 to 505 is prepared for each pool 183, but may be common to all the pools 183.
  • FIG. 6 shows an example of the progress management table 307.
  • the progress management table 307 is a table for managing the progress of the copy back process for the replaced drive (the drive replaced with the failed drive 124).
  • the progress management table 307 manages entries that hold information such as an address 601, a completion flag 602, and a data position 603 for each strip included in the drive after replacement.
  • An address 601 is a strip address (number).
  • the completion flag 602 is a flag indicating whether or not the data restoration for the strip is completed. As the value of the completion flag 602, “1” is set when it is completed, and “0” is set when it is not completed.
  • a data position 603 indicates a position (a combination of the number of the drive 124 and the address of the spare area in the drive 124) where the data (data to be copied back) to be restored exists in the strip.
  • FIG. 7 shows an example of the head address management table 308.
  • the head address management table 308 is a table for managing a position where the copy back process is not completed.
  • the head address management table 308 stores the head address of the addresses 601 corresponding to the completion flag 602 “0” in the progress management table 307.
  • the processing performed in this embodiment will be described by taking one pool 183 as an example.
  • the pool 183 is referred to as a “target pool 183”.
  • the meaning of each term is as follows.
  • “Problem drive” is a drive in which a problem has occurred.
  • “Problem” means a failure or a high possibility of occurrence of a failure.
  • the problem drive is a failed drive.
  • a “failed drive” is a drive in which a failure has occurred.
  • a “failure strip” is a strip in a failed drive.
  • a failure candidate drive (a drive with a high possibility of failure) can be adopted.
  • a strip in a failure candidate drive may be referred to as a “failure candidate strip”.
  • “Restoration” is a term that may be used to mean writing to a spare area in a drive or writing to a strip in a drive after replacement.
  • Drive after replacement is a drive that has been replaced with a problem drive.
  • “Recovery processing” is a term that means processing including rebuild processing and copy back processing.
  • “Recovery” is a term that includes rebuild and copyback.
  • “Rebuild process” is a process including a rebuild. “Rebuild” is to restore data corresponding to all fault strips to a plurality of spare areas, that is, collection copy.
  • the “data corresponding to the failure strip” is typically data in the failure strip, but the data after updating the data in the failure strip may also correspond.
  • rebuilding is to restore data in all failure candidate strips to a plurality of spare areas, that is, dynamic sparing.
  • “Copy back processing” is processing including copy back.
  • “Copy back” is a process corresponding to a data copy process to be described later, and is to copy data from a spare area to a strip in a drive after replacement. In this embodiment, data may be restored to the strip in the drive after replacement by copy back, or data may be restored according to host I / O instead of copy back.
  • Data XX is data (data element or parity) in the strip XX (XX is a number).
  • FIG. 8 shows the flow of recovery processing.
  • the recovery process is started when, for example, the CPU 113 (for example, the rebuild process program 302) detects that a failure (failure) has occurred in the drive 124.
  • the CPU 113 for example, the rebuild process program 302
  • the rebuild process program 302 starts the rebuild process for restoring the same data as the data (data element or parity) in the strip in the failed drive 124 to the spare area of the normal disk 124 (S801).
  • FIG. 9 shows an example of the details of the rebuild process.
  • the faulty drive 00 includes fault strips 0-1, 2-2, 1-1 and 2-3.
  • the rebuild process program 302 Based on the data 0-1 in the normal drives 05, 06 and 0B, the rebuild process program 302 changes the data 0-1 in the failed drive 00 to the spare area in any normal drive, for example, the normal drive 01. Restore to spare area.
  • the rebuild processing program 302 adds the position of the spare area (a combination of the drive 01 number and the address of the spare area) to the progress management table 307 corresponding to the target pool 183 as the data position 603.
  • the rebuild processing program 302 restores the other data 2-2, 1-1 and 2-3 in the failed drive 00 to, for example, spare areas in the normal drives 02 to 04, and The positions of the spare areas are added to the progress management table 307 as data positions 603, respectively.
  • a plurality of data in the failed drive are restored to the spare areas of a plurality of normal drives, respectively.
  • data write destinations are not distributed to a single drive like a spare drive, but are distributed to a plurality of drives. For this reason, it is expected that the time required for the rebuild process can be shortened.
  • the failed drive 124 is replaced, for example, by maintenance personnel (S802).
  • the rebuild process program 302 determines whether the rebuild process has been completed (S803). If the rebuild is not completed (S803: NO), the rebuild process program 302 continues the rebuild process. On the other hand, if the rebuild process has been completed (S803: YES), the rebuild process program 302 outputs a drive replacement completion sign (for example, a predetermined LED (Light Emitting Diode) provided in the storage system 102). (S804).
  • a drive replacement completion sign for example, a predetermined LED (Light Emitting Diode) provided in the storage system 102).
  • the copyback processing program 303 executes the copyback processing in response to, for example, an instruction for copyback processing (or in response to detection of the completion of drive replacement) (S805).
  • FIG. 10 shows an example of details of the copyback process.
  • FIG. 10 corresponds to the continuation of the rebuild process shown in FIG.
  • the copy back processing program 303 converts the data 0-1, 2-2, 1-1, and 2-3 in the spare area of the drives 01 to 04 to the strip 0-1, the replacement drive 00, respectively. Restore (copy) to 2-2, 1-1, and 2-3, respectively.
  • FIG. 11 shows the detailed flow of the copyback process.
  • the copyback processing program 303 refers to the redundancy 402 corresponding to the target pool 183 and determines whether the redundancy 402 is “0” (S1101). If the determination result in S1101 is true (S1101: YES), the copyback processing program 303 executes data copy processing (S1105). This is to increase the reliability of data protection of the target pool 183.
  • the redundancy “0” is an example of a redundancy threshold. The threshold may be greater than zero.
  • the copyback processing program 303 executes data copy processing (S1105). This is because since the host I / O is not performed, the load on the CPU 113 is relatively low, and it is considered that it is efficient to use it for the data copy processing.
  • the copyback processing program 303 determines whether the write ratio in the host I / O to the target pool 183 is less than the write ratio 503 corresponding to the target pool 183. Is determined (S1103). If the determination result in S1103 is true (S1103: YES), the copyback processing program 303 executes data copy processing (S1105). This is because if the write ratio is low, there is a low possibility that data is restored to the strip in the drive after the replacement without data copy between the drives in the later-described write processing during the copy back processing.
  • the copyback processing program 303 determines whether or not the CPU usage rate related to the target pool 183 is less than the CPU usage rate 504 corresponding to the target pool 183 ( S1104). If the determination result in S1104 is true (S1104: YES), the copyback processing program 303 executes data copy processing (S1105). This is because the load on the CPU 113 is relatively low, and it is considered that it is more efficient to use it for data copy processing.
  • the copyback processing program 303 stops the data copy process (S1106). At this time, if the state 403 corresponding to the target pool 183 is not “stopped”, the copyback processing program 303 updates the state 403 to “stopped”. At this time, if the data copy process is already stopped (if the state 403 is already “stopped”), S1106 may be skipped (that is, the data copy process remains stopped).
  • the copy back processing program 303 determines whether the elapsed time from the data copy processing stop time is equal to or longer than the determination waiting time 505 corresponding to the target pool 183 (S1107).
  • the copyback processing program 303 determines whether the copyback process has been completed. (S1108).
  • the determination in S1108 is a determination as to whether or not the completion flag 602 corresponding to all the strips in the post-replacement drive is “1” in the progress management table 307 corresponding to the target pool 183. If the determination result in S1108 is false (S1108: NO), the process returns to Step 1101. On the other hand, if the determination result in S1108 is true (S1108: YES), the copyback process is terminated.
  • S1101, S1102, S1103, and S1104 are in descending order of priority. The determination may be made in a different order. Further, the data copy process (S1105) may be performed when at least one determination result of S1101 to S1104 is true, or the data copy process when at least one determination result of S1101 to S1104 is false. (S1105) may be stopped. For example, if the redundancy 402 of the target pool 183 is larger than a threshold (eg, “0”) (S1101: NO), the data copy process may be stopped regardless of the result of other determinations.
  • a threshold eg, “0”
  • the data copy process is stopped regardless of the result of other determinations. Also good. For example, if the write ratio is equal to or higher than the write ratio 503 (S1103: NO), the data copy process may be stopped regardless of the result of other determinations.
  • FIG. 12 shows the detailed flow of the data copy process.
  • the copyback processing program 303 identifies the head address for which data copying has not been completed from the head address management table 308 (S1201).
  • the copyback processing program 303 identifies the data position 603 corresponding to the address 601 that matches the address identified in S1201 (S1202).
  • the copyback processing program 303 copies the copyback processing target data from the spare area according to the data position 603 specified in S1202 to the strip (strip in the drive after replacement) specified in S1201 (S1203). .
  • the copyback processing program 303 updates the completion flag 602 corresponding to the copy destination strip to “1” (completed), and sets the head address in the head address management table 308 to the completion flag 602 “0”. Update to the head address of the corresponding address 601 (S1204). In S1204, when the minimum value of the plurality of redundancy levels corresponding to the plurality of stripes in the target pool 183 becomes high, the copyback processing program 303 also updates the redundancy level 402 corresponding to the target pool 183.
  • the copy back processing program 303 determines whether or not the copy back processing is completed (S1205). This determination is the same as the determination in S1108 of FIG. If the determination result in S1205 is true (S1205: YES), the copyback process ends.
  • the copy back processing program 303 determines whether or not the elapsed time from the start time of the data copy process is equal to or greater than the determination waiting time 505 corresponding to the target pool 183. Determination is made (S1206). If the determination result in S1206 is false (S1206: NO), the process returns to S1201 (that is, the data copy process continues). On the other hand, if the determination result in S1206 is true (S1206: YES), the process returns to S1101 (at this time, the data copy process may be temporarily ended (stopped)).
  • FIG. 13 shows an example of write processing during copy back processing.
  • the host I / O processing program 301 When the host I / O processing program 301 receives a write request from the host 101 during the copy back processing, the host I / O processing program 301 writes the data according to the write request to the strip in the drive after replacement. As a result, the data is restored to the strip in the drive after replacement. Details are as follows, for example.
  • the host I / O processing program 301 When the host I / O processing program 301 receives a write request with the strip 2-2 in the drive 00 after replacement as the write destination during the copy back processing (S14-1), the host I / O processing program 301 Data 2-2 is read from each strip 2-2 of the normal drives 01, 06 and 0B (S14-2). Next, the host I / O processing program 301 calculates a parity from the plurality of read data 2-2 and the updated data 2-2 from the host 101 (S14-3). Next, the host I / O processing program 301 writes the post-update data 2-2 to the strip 2-2 in the post-replacement drive 00 (S14-4). The host I / O processing program 301 returns a write completion response to the host 101 (S14-5).
  • the host I / O processing program 301 updates it to “1” (S14-6). As described above, when a write request is received during the copy back process, the data can be restored to the strip in the drive after replacement by taking advantage of the process of the write request.
  • FIG. 14 shows an example of the transition of the progress management table 307 according to the progress of the copyback process and the write process.
  • the progress management table 307 is schematically expressed (as a bitmap). It is assumed that the addresses are arranged as indicated by broken arrows.
  • the incomplete copy start address indicates the ninth strip, but the completion flag 602 corresponding to the 16th and 20th strips is “1”. It becomes.
  • the address of the 16th strip that has already been restored (the completion flag 602 is “1”) It is skipped because it is an unfinished start address. That is, in the copy back process, the data copy process is skipped for the strip on which the data is restored by taking advantage of the write process. This reduces the load on the copyback process.
  • a distributed RAID configuration is adopted. Instead of providing a spare drive, each drive 124 is provided with a spare area. A plurality of data in the failed drive are restored to spare areas of a plurality of normal drives, respectively. In other words, data write destinations are not distributed to a single drive like a spare drive, but are distributed to a plurality of drives. For this reason, the time required for the rebuild process can be shortened. As a result, the time during which the redundancy is reduced can be shortened, and the decrease in I / O processing performance can be reduced.
  • a spare drive is adopted, and the restoration destination in the rebuild process is aggregated into the spare drive, that is, all data in the failed drive is restored to the spare drive.
  • the spare drive becomes a member of the group in place of the failed drive, it is possible to expect copy backlessness that can make it appear that the copy back processing is completed without copying between the drives.
  • the host I / O processing program 301 sends updated data according to the write request to the strip in the drive after replacement in the write processing (processing according to the write request from the host 101) during copy back processing. Restore.
  • piggybacking on the light process restores the data to the strip in the drive after replacement.
  • the copy back process the data copy process is skipped for the restored strip. This reduces the load on the copyback process. As a result, a decrease in I / O processing performance can be reduced.
  • thresholds are provided for the redundancy, the write ratio, and the CPU usage rate, and the execution and stop of the data copy processing are performed according to the comparison between the status during the copy back processing and those threshold values. Be controlled. For example, when the load on the CPU 113 is low (specifically, the host I / O has not been received for a certain period of time or the CPU usage rate is low), the data copy process is executed. If the write ratio is high, the data copy process is stopped. By finely controlling the execution and stop of the data copy process in the copy back process, it is possible to restore the drive while suppressing a decrease in host I / O processing performance.
  • Example 2 will be described. At that time, differences from the first embodiment will be mainly described, and description of common points with the first embodiment will be omitted or simplified.
  • FIG. 15 illustrates an example of read processing during copy back processing according to the second embodiment.
  • the host I / O processing program 301 when the host I / O processing program 301 receives a read request from the host 101 during the copyback process, the host I / O processing program 301 stores the data according to the read request.
  • the spare area is read and returned to the host 101, and the data is written to the strip in the drive after replacement.
  • the data written on the strip is data to be copied back, and as a result, the data is restored to the strip in the drive after the replacement. Details are as follows, for example.
  • the host I / O processing program 301 receives a read request with the strip 2-2 in the drive 00 after replacement as the read source (S13-1).
  • the completion flag 602 corresponding to the address 601 of the strip 2-2 is “0”
  • the host I / O processing program 301 indicates a spare area according to the data position 603 corresponding to the address 601 (normal in the example of FIG. 15).
  • Data 2-2 is read from the spare area of the drive 02 (S13-2).
  • the host I / O processing program 301 returns the read data 2-2 to the host 101 (S13-3), and writes the data 2-2 to the strip 2-2 in the drive after replacement. (S13-4).
  • the host I / O processing program 301 updates the completion flag 602 corresponding to the strip 2-2 to “1” (S13-5).
  • a host I / O frequency threshold value may be employed instead of or in addition to the write ratio 503 threshold value.
  • the copyback processing program 303 may execute a determination (hereinafter referred to as determination A) as to whether or not the host I / O frequency related to the target pool 183 is less than the threshold instead of or in addition to the determination in S1103. If the result of determination A is true, the copyback processing program 303 may execute data copy processing. This is because there is a low possibility that data is restored to the strip in the drive after the exchange by taking advantage of the host I / O process (write process or read process) during the copy back process.
  • a read ratio threshold value may be used instead of or in addition to at least one of the write ratio 503 and the host I / O frequency threshold value.
  • the copyback processing program 303 replaces or in addition to at least one of the determination in S1103 and the above determination A with the read ratio of the target pool 183 (the ratio of the read request to the host I / O related to the target pool 183) ) May be less than the lead ratio (threshold value) (hereinafter referred to as determination B). If the result of determination B is true, the copyback processing program 303 may execute data copy processing. This is because there is a low possibility that data is restored to the strip in the drive after the exchange by taking advantage of the host I / O process (write process or read process) during the copy back process.
  • At least one of the determinations of S1101 to S1104, determination A, and determination B described above (for example, at least one of determination of S1103, determination A and determination B, determination of S1101, and determination of S1102 or S1104) ) Is included in the determination as to whether or not the execution condition of the data copy process is satisfied.
  • the copyback processing program 303 may determine whether or not the execution condition of the data copy process is satisfied regularly or irregularly. If the determination result is true, the copyback processing program 303 can execute the data copy processing (may include continuation). If the determination result is false, the copyback processing program 303 can stop the data copy processing.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

In this storage system, each of a plurality of storage devices has a plurality of strips and one or more spare regions. In a RAID group formed by these storage devices, at least one of two or more storage devices relating to a first stripe differs from at least one of two or more storage devices relating to a second stripe. The storage system performs: a rebuild process in which data for the plurality of strips of a failed storage device are restored respectively to a plurality of spare regions of two or more storage devices; and a copy-back process including a data copy process in which, after the failed storage device has been replaced by another storage device, data in at least one spare region of said plurality of spare regions are copied to at least one strip of the plurality of strips of the other storage device, wherein said at least one spare region is associated with said at least one strip, and no data have yet been restored to said at least one strip. During the copy-back process, the storage system skips the data copy process for a strip if data have already been written to the strip in a write process or a read process.

Description

ストレージシステム及び復旧制御方法Storage system and recovery control method
 本発明は、概して、ストレージシステムの復旧制御に関する。 The present invention generally relates to storage system recovery control.
 複数のディスクで構成されたRAID(Redundant Array of Independent (or Inexpensive) Disks)グループを有するストレージシステムに関し、スペアディスクが設けられたストレージシステムが知られている。その種のストレージシステムでは、RAIDグループにおける障害ディスク内のデータをスペアディスクに復元すること(リビルド処理)、及び、障害ディスクの交換後のディスクにスペアディスクからデータをコピーすること(コピーバック処理)が行われる(例えば、特許文献1)。 Regarding a storage system having a RAID (Redundant Array of Independent) (or Inexpensive) Disks) group composed of a plurality of disks, a storage system provided with a spare disk is known. In such a storage system, the data in the failed disk in the RAID group is restored to the spare disk (rebuild process), and the data is copied from the spare disk to the disk after the failed disk is replaced (copy back process). (For example, Patent Document 1).
特開2006-260236号公報JP 2006-260236 A
 一般に、RAIDグループを構成する記憶デバイス(例えばディスク)として、記憶容量の大きい記憶デバイスが採用される。 Generally, a storage device having a large storage capacity is adopted as a storage device (for example, a disk) constituting a RAID group.
 スペア記憶デバイスの記憶容量が大きいと、スペア記憶デバイスへのリビルド処理、及び、スペア記憶デバイスからのコピーバック処理のいずれについても、長い時間がかかる。このため、I/O処理性能(I/O要求に従うI/O処理の性能)が低下し得る。 If the storage capacity of the spare storage device is large, it takes a long time for both rebuild processing to the spare storage device and copy back processing from the spare storage device. For this reason, the I / O processing performance (I / O processing performance in accordance with the I / O request) may be lowered.
 ストレージシステムは、論理的な1以上のRAIDグループを提供する複数の記憶デバイスと、複数の記憶デバイスに接続された1以上のプロセッサであるプロセッサ部とを有する。各RAIDグループは、2以上のストライプで構成されている。各ストライプは、2以上のストリップで構成されている。複数の記憶デバイスの各々は、複数のストリップと、1以上のスペア領域とを有する。各RAIDグループについて、いずれかのストライプを構成する2以上のストリップをそれぞれ提供する2以上の記憶デバイスの少なくとも1つと、別のいずれかのストライプを構成する2以上のストリップをそれぞれ提供する2以上の記憶デバイスの少なくとも1つが、異なる記憶デバイスである。プロセッサ部は、I/O要求を受けた場合、そのI/O要求に従うI/O処理を実行するようになっている。プロセッサ部は、複数の記憶デバイスのうちの問題記憶デバイスの複数のストリップに対応したデータを2以上の記憶デバイスの複数のスペア領域にそれぞれ復元するリビルド処理を実行する。また、プロセッサ部は、リビルド処理の完了後、問題記憶デバイスの交換後記憶デバイスが有する複数のストリップのうちデータ未復元のストリップに対応したスペア領域からデータをそのストリップにコピーするデータコピー処理を含んだコピーバック処理を実行する。プロセッサ部は、以下の(W)及び(R)のうちの少なくとも1つ、
(W)交換後記憶デバイス内のストリップをライト先としたライト要求をコピーバック処理中に受けた場合、そのライト要求に従うライト処理において、そのライト要求に従うデータを、そのライト先のストリップに書き込むこと、
(R)交換後記憶デバイス内のストリップをリード元としたリード要求をコピーバック処理中に受けた場合、そのリード要求に従うリード処理において、そのリード元に対応したスペア領域からデータを読み込み、その読み込んだデータを、そのリード元のストリップに書き込むこと、
を実行する。プロセッサ部は、交換後記憶デバイスが有する複数のストリップのうち、コピーバック処理中のライト処理及びリード処理のいずれかにおいてデータが既に書き込まれているストリップについては、データコピー処理をスキップする。
The storage system includes a plurality of storage devices that provide one or more logical RAID groups, and a processor unit that is one or more processors connected to the plurality of storage devices. Each RAID group is composed of two or more stripes. Each stripe is composed of two or more strips. Each of the plurality of storage devices has a plurality of strips and one or more spare areas. For each RAID group, at least one of two or more storage devices each providing two or more strips constituting any stripe, and two or more providing each two or more strips constituting any other stripe At least one of the storage devices is a different storage device. When receiving an I / O request, the processor unit executes I / O processing according to the I / O request. The processor unit executes a rebuild process for restoring data corresponding to a plurality of strips of a problem storage device among a plurality of storage devices to a plurality of spare areas of two or more storage devices, respectively. In addition, the processor unit includes a data copy process for copying data from a spare area corresponding to a strip in which data is not restored among a plurality of strips included in the storage device after replacement of the problem storage device after the rebuild process is completed. Execute copyback processing. The processor unit includes at least one of the following (W) and (R):
(W) When a write request with the strip in the storage device as the write destination is received during the copyback process after the replacement, the data according to the write request is written to the write destination strip in the write process according to the write request. ,
(R) When a read request with the strip in the storage device as the read source is received during the copy back process after the replacement, data is read from the spare area corresponding to the read source and read in the read process according to the read request. Write the data to the strip that leads it,
Execute. The processor unit skips the data copy process for a strip in which data has already been written in either the write process or the read process during the copy back process among the plurality of strips of the storage device after replacement.
 I/O処理性能の低下を軽減しつつ記憶デバイスを復旧することができる。 The storage device can be restored while reducing the decrease in I / O processing performance.
実施例1に係る計算機システムの構成の一例を示す。2 shows an example of the configuration of a computer system related to Example 1. プールの論理構成及び物理構成の一例を示す。2 shows an example of a logical configuration and a physical configuration of a pool. ローカルメモリに格納されるプログラム及びテーブルの一例を示す。2 shows an example of a program and a table stored in a local memory. プール状態テーブルの一例を示す。An example of a pool state table is shown. コピーバック設定管理テーブルの一例を示す。An example of a copyback setting management table is shown. 進捗管理テーブルの一例を示す。An example of a progress management table is shown. 先頭アドレス管理テーブルの一例を示す。An example of a head address management table is shown. 復旧処理の流れを示す。The flow of recovery processing is shown. リビルド処理の詳細の一例を示す。An example of the details of a rebuild process is shown. コピーバック処理の詳細の一例を示す。An example of the details of a copyback process is shown. コピーバック処理の詳細の流れを示す。The detailed flow of a copyback process is shown. データコピー処理の詳細の流れを示す。The detailed flow of a data copy process is shown. コピーバック処理中のライト処理の一例を示す。An example of write processing during copyback processing is shown. コピーバック処理の進捗とライト処理とに応じた進捗管理テーブルの遷移の一例を示す。An example of the transition of the progress management table according to the progress of the copyback process and the write process is shown. 実施例2に係るコピーバック処理中のリード処理の一例を示す。22 shows an example of read processing during copy back processing according to the second embodiment.
 幾つかの実施例を、図面を参照して説明する。なお、以下に説明する実施例は特許請求の範囲に係る発明を限定するものではなく、また実施例で説明されている諸要素及びその組み合わせの全てが発明の解決手段に必須であるとは限らない。
 なお、以下の説明では、「xxxテーブル」といった表現にて情報を説明することがあるが、情報は、どのようなデータ構造で表現されていてもよい。すなわち、情報がデータ構造に依存しないことを示すために、「xxxテーブル」を「xxx情報」と言うことができる。また、以下の説明において、各テーブルの構成は一例であり、1つのテーブルは、2以上のテーブルに分割されてもよいし、2以上のテーブルの全部又は一部が1つのテーブルであってもよい。
Several embodiments will be described with reference to the drawings. The embodiments described below do not limit the invention according to the claims, and all the elements and combinations described in the embodiments are not necessarily essential to the solution of the invention. Absent.
In the following description, information may be described using an expression such as “xxx table”, but the information may be expressed in any data structure. That is, in order to show that the information does not depend on the data structure, the “xxx table” can be referred to as “xxx information”. In the following description, the configuration of each table is an example, and one table may be divided into two or more tables, or all or part of the two or more tables may be a single table. Good.
 また、以下の説明では、「インターフェース部」は、1以上のインターフェースを含む。1以上のインターフェースは、1以上の同種のインターフェースデバイス(例えば1以上のNIC(Network Interface Card))であってもよいし2以上の異種のインターフェースデバイス(例えばNICとHBA(Host Bus Adapter))であってもよい。 In the following description, the “interface unit” includes one or more interfaces. The one or more interfaces may be one or more similar interface devices (for example, one or more NIC (Network Interface Card)) or two or more different interface devices (for example, NIC and HBA (Host Bus Adapter)). There may be.
 また、以下の説明では、「記憶部」は、1以上のメモリを含む。記憶部に関して少なくとも1つのメモリは、揮発性メモリでよい。記憶部は、主に、プロセッサ部による処理の際に使用される。 In the following description, the “storage unit” includes one or more memories. The at least one memory for the storage unit may be a volatile memory. The storage unit is mainly used during processing by the processor unit.
 また、以下の説明では、「プロセッサ部」は、1以上のプロセッサを含む。少なくとも1つのプロセッサは、典型的には、CPU(Central Processing Unit)のようなマイクロプロセッサである。1以上のプロセッサの各々は、シングルコアでもよいしマルチコアでもよい。プロセッサは、処理の一部または全部を行うハードウェア回路を含んでもよい。 In the following description, the “processor unit” includes one or more processors. The at least one processor is typically a microprocessor such as a CPU (Central Processing Unit). Each of the one or more processors may be a single core or a multi-core. The processor may include a hardware circuit that performs part or all of the processing.
 また、以下の説明では、「プログラム」を主語として処理を説明する場合があるが、プログラムは、プロセッサ部(例えばCPU(Central Processing Unit))によって実行されることで、定められた処理を、適宜に記憶部(例えばメモリ)及び/又はインターフェースデバイス(例えば通信ポート)等を用いながら行うため、処理の主語が、プロセッサ部(或いは、そのプロセッサ部を有する装置又はシステム)とされてもよい。また、プロセッサ部は、処理の一部または全部を行うハードウェア回路を含んでもよい。プログラムは、プログラムソースから計算機のような装置にインストールされてもよい。プログラムソースは、例えば、プログラム配布サーバまたは計算機が読み取り可能な(例えば非一時的な)記録媒体であってもよい。また、以下の説明において、2以上のプログラムが1つのプログラムとして実現されてもよいし、1つのプログラムが2以上のプログラムとして実現されてもよい。 In the following description, the process may be described using “program” as a subject. However, the program is executed by a processor unit (for example, CPU (Central Processing Unit)), so that the determined processing is appropriately performed. Therefore, the subject of processing may be the processor unit (or an apparatus or system having the processor unit) because the processing is performed using a storage unit (for example, a memory) and / or an interface device (for example, a communication port). The processor unit may include a hardware circuit that performs part or all of the processing. The program may be installed in a computer-like device from a program source. The program source may be, for example, a recording medium (for example, non-transitory) readable by a program distribution server or a computer. In the following description, two or more programs may be realized as one program, or one program may be realized as two or more programs.
 また、以下の説明では、「ホストシステム」は、1以上の物理的なホスト計算機(例えばホスト計算機のクラスタ)であってもよいし、少なくとも1つの仮想的なホスト計算機(例えばVM(Virtual Machine))を含んでもよい。以下、ホストシステムを、単に「ホスト」と呼ぶ。 In the following description, the “host system” may be one or more physical host computers (for example, a cluster of host computers), or at least one virtual host computer (for example, VM (Virtual Machine)). ) May be included. Hereinafter, the host system is simply referred to as “host”.
 また、以下の説明では、「ストレージシステム」は、1以上のストレージ装置でよい。「ストレージ装置」は、記憶デバイスにデータを格納する機能を有する装置であればよい。このため、ストレージ装置は、ファイルサーバのような計算機(例えば汎用計算機)であってもよい。例えば、少なくとも1つの物理的なストレージ装置が、仮想的な計算機(例えばVM(Virtual Machine))を実行してもよいし、SDx(Software-Defined anything)を実行してもよい。SDxとしては、例えば、SDS(Software Defined Storage)(仮想的なストレージ装置の一例)又はSDDC(Software-defined Datacenter)を採用することができる。また、例えば、少なくとも1つのストレージ装置(計算機)は、ハイパーバイザを有していてよい。ハイパーバイザが、サーバとして動作するサーバVM(Virtual Machine)と、ストレージとして動作するストレージVMとを生成してよい。サーバVMが、I/O要求を発行するホストとして動作し、ストレージVMが、サーバVMからのI/O要求に応答してドライブに対するI/Oを行うストレージコントローラとして動作してよい。 In the following description, the “storage system” may be one or more storage devices. The “storage device” may be any device having a function of storing data in the storage device. For this reason, the storage device may be a computer (for example, a general-purpose computer) such as a file server. For example, at least one physical storage device may execute a virtual computer (for example, VM (Virtual Machine)), or may execute SDx (Software-Defined anything). As SDx, for example, SDS (Software Defined Storage) (an example of a virtual storage device) or SDDC (Software-defined Datacenter) can be adopted. For example, at least one storage device (computer) may have a hypervisor. The hypervisor may generate a server VM (Virtual Machine) that operates as a server and a storage VM that operates as a storage. The server VM may operate as a host that issues an I / O request, and the storage VM may operate as a storage controller that performs I / O to a drive in response to an I / O request from the server VM.
 また、以下の説明では、同種の要素を区別しないで説明する場合には、参照符号(又は参照符号における共通部分)を使用し、同種の要素を区別して説明する場合は、要素のID(又は要素の参照符号)を使用することがある。 Moreover, in the following description, when explaining without distinguishing the same kind of element, a reference code (or a common part in the reference sign) is used, and when explaining the same kind of element separately, the element ID (or Element reference signs) may be used.
 また、以下の説明では、要素の識別子として番号が使用されるが、番号に代えて又は加えて他種の符号が使用されてもよい。 In the following description, numbers are used as element identifiers, but other types of codes may be used instead of or in addition to numbers.
 図1は、実施例に係る計算機システムの構成の一例を示す。 FIG. 1 shows an example of the configuration of a computer system according to the embodiment.
 計算機システムは、ホスト101と、ストレージシステム102とを有する。ホスト101と、ストレージシステム102とは、通信ネットワーク152を介して相互に接続される。 The computer system has a host 101 and a storage system 102. The host 101 and the storage system 102 are connected to each other via a communication network 152.
 ホスト101は、ストレージシステム102にI/O(Input/Output)要求を送信する。I/O要求は、I/O先の場所を表すI/O先情報を含む。I/O先情報は、例えば、I/O先のLU(Logical Unit)のLUN(Logical Unit Number)と、そのLUにおける領域のLBA(Logical Block Address)とを含む。LUは、ストレージシステム110から提供される論理ボリューム(論理的な記憶デバイス)である。I/O先情報を基に、I/O先の論理領域が特定され、その論理領域に基づくドライブ124が特定される。 The host 101 transmits an I / O (Input / Output) request to the storage system 102. The I / O request includes I / O destination information indicating the location of the I / O destination. The I / O destination information includes, for example, the LUN (Logical Unit Number) of the LU (Logical Unit) of the I / O destination and the LBA (Logical Unit Block Address) of the area in the LU. An LU is a logical volume (logical storage device) provided from the storage system 110. Based on the I / O destination information, the logical area of the I / O destination is specified, and the drive 124 based on the logical area is specified.
 ストレージシステム102は、ストレージコントローラ103と、ドライブボックス121とを含む。ドライブボックス121は、複数(又は1個)のプール183を含む。各プール183は、複数のドライブ124を含む。ドライブ124は、記憶デバイス(典型的には不揮発性の記憶デバイス)の一例であり、例えば、HDD(Hard Disk Drive)又はSSD(Solid State Drive)である。 The storage system 102 includes a storage controller 103 and a drive box 121. The drive box 121 includes a plurality (or one) of pools 183. Each pool 183 includes a plurality of drives 124. The drive 124 is an example of a storage device (typically a non-volatile storage device), and is, for example, an HDD (Hard Disk Drive) or an SSD (Solid State Drive).
 ストレージコントローラ103は、ホストI/F111と、キャッシュメモリ(CM)112と、CPU(Central Processing Unit)113と、ドライブI/F114と、ローカルメモリ(LM)115とを有する。ホストI/F111及びドライブI/F114が、インターフェース部の一例である。キャッシュメモリ112及びローカルメモリ115が、記憶部の一例である。CPU113が、プロセッサ部の一例である。 The storage controller 103 includes a host I / F 111, a cache memory (CM) 112, a CPU (Central Processing Unit) 113, a drive I / F 114, and a local memory (LM) 115. The host I / F 111 and the drive I / F 114 are examples of the interface unit. The cache memory 112 and the local memory 115 are examples of a storage unit. The CPU 113 is an example of a processor unit.
 ホストI/F111は、ストレージコントローラ103のインターフェースデバイスの一例であり、ホスト101との間で通信を行う。キャッシュメモリ112は、ホスト101からドライブ124に書き込まれるデータ(ライトデータ)と、ドライブ124から読み出されたデータ(リードデータ)とを一時的に格納する。CPU113は、ローカルメモリ115に格納されたプログラムを実行して各種処理を実行する。CPU113は、ホストI/F111と、キャッシュメモリ112と、ドライブI/F114と、ローカルメモリ115とに接続されている。CPU113は、ドライブI/F114を介して各種コマンドを記憶デバイス121のドライブ124に送信する。ドライブI/F114は、ストレージコントローラ103のインターフェースデバイスの一例であり、各ドライブ124との間で通信を行う。ローカルメモリ115は、各種プログラム及び各種情報を格納する。 The host I / F 111 is an example of an interface device of the storage controller 103, and communicates with the host 101. The cache memory 112 temporarily stores data (write data) written from the host 101 to the drive 124 and data (read data) read from the drive 124. The CPU 113 executes various processes by executing a program stored in the local memory 115. The CPU 113 is connected to the host I / F 111, the cache memory 112, the drive I / F 114, and the local memory 115. The CPU 113 transmits various commands to the drive 124 of the storage device 121 via the drive I / F 114. The drive I / F 114 is an example of an interface device of the storage controller 103, and communicates with each drive 124. The local memory 115 stores various programs and various information.
 ストレージコントローラ103の動作の一例は次の通りである。すなわち、ストレージコントローラ103は、ホスト101から受信したI/O要求を処理する。具体的には、例えば、ストレージコントローラ103は、そのI/O要求のI/O先に基づきデータのI/O先となるドライブ124を特定し、特定したドライブ124に対するI/Oを実行する。その際、ストレージコントローラ103は、I/O対象のデータをキャッシュメモリ112にキャッシュする。 An example of the operation of the storage controller 103 is as follows. That is, the storage controller 103 processes the I / O request received from the host 101. Specifically, for example, the storage controller 103 identifies the drive 124 that is the data I / O destination based on the I / O destination of the I / O request, and executes I / O for the identified drive 124. At this time, the storage controller 103 caches the I / O target data in the cache memory 112.
 図2は、プール183の論理構成及び物理構成の一例を示す。 FIG. 2 shows an example of the logical configuration and physical configuration of the pool 183.
 論理構成によれば、プール183は、複数のRAIDグループ(論理的なRAIDグループ)223を有する。RAIDグループ223に基づき論理ボリュームが提供される。各RAIDグループ223は、複数のドライブ224(論理的なドライブ)を有する。各RAIDグループ223は、2D+2PのRAID構成とRAID6のRAIDレベルとに従うRAIDグループである。即ち、各RAIDグループ223において、各ストライプが、4個のドライブ224がそれぞれ有する4個のストリップ(単位記憶領域)で構成されている。各ストライプでは、2個のストリップにそれぞれ2個のデータ要素(D)が格納され、2個のストリップに、それぞれ、その2個のデータ要素に基づく2個のパリティ(P)が格納される。プール183において、複数のRAIDグループ223にそれぞれ対応した複数のRAID種別(RAIDレベル及びRAID構成)は同じでもよい。 According to the logical configuration, the pool 183 has a plurality of RAID groups (logical RAID groups) 223. A logical volume is provided based on the RAID group 223. Each RAID group 223 has a plurality of drives 224 (logical drives). Each RAID group 223 is a RAID group according to a 2D + 2P RAID configuration and a RAID 6 RAID level. That is, in each RAID group 223, each stripe is composed of four strips (unit storage areas) that the four drives 224 respectively have. In each stripe, two data elements (D) are stored in two strips, respectively, and two parities (P) based on the two data elements are stored in two strips. In the pool 183, the plurality of RAID types (RAID level and RAID configuration) respectively corresponding to the plurality of RAID groups 223 may be the same.
 一方、物理構成によれば、各プール183は、複数のドライブグループ123を有する。本実施例では、ドライブグループ123の数は、RAIDグループ223の数と同じであるが、異なっていてもよい。また、本実施例では、各ドライブグループ123を構成するドライブ124の数は、RAIDグループ223を構成するドライブ224の数と同じあるが、異なっていてもよい。本実施例では、各ドライブグループ123は、4個のドライブ124(物理的なドライブ)を有する。各ドライブグループ123は、それ自体ではRAIDグループを構成しない。プール183が、論理的に、上述の複数のRAIDグループ223を構成する。 On the other hand, according to the physical configuration, each pool 183 has a plurality of drive groups 123. In this embodiment, the number of drive groups 123 is the same as the number of RAID groups 223, but may be different. In the present embodiment, the number of drives 124 constituting each drive group 123 is the same as the number of drives 224 constituting the RAID group 223, but may be different. In the present embodiment, each drive group 123 has four drives 124 (physical drives). Each drive group 123 does not constitute a RAID group by itself. The pool 183 logically configures the plurality of RAID groups 223 described above.
 本実施例では、分散RAIDの構成が採られている。具体的には、各RAIDグループ223は、2以上のストライプで構成されている。各ストライプは、2以上のストリップで構成されている。各ドライブ124が、複数のストリップを有する。各RAIDグループ223について、第1のストライプを構成する2以上のストリップをそれぞれ提供する2以上のドライブ124の少なくとも1つと、第2のストライプを構成する2以上のストリップをそれぞれ提供する2以上のドライブ124の少なくとも1つが、異なるドライブ124である。各RAIDグループ123について、第1のストライプは、いずれかのストライプであり、第2のストライプは、その第1のストライプ以外のいずれかのストライプである。具体的には、本実施例では、同一ストライプを構成する複数のストリップが、異なるドライブグループに分散しており、且つ、その複数のストリップにそれぞれ対応した複数のドライブ位置(ドライブにおける物理的なアドレスに従う位置)が異なっている。図2では、同一のストライプを構成する4個のストリップの各々には、同じ番号が付されている。物理構成によれば、例えば、同一のストライプを構成する4個のストリップ0-0は、複数のドライブグループ123に分散している。 In this embodiment, a distributed RAID configuration is adopted. Specifically, each RAID group 223 is composed of two or more stripes. Each stripe is composed of two or more strips. Each drive 124 has a plurality of strips. For each RAID group 223, at least one of the two or more drives 124 that respectively provide two or more strips that constitute the first stripe, and two or more drives that respectively provide two or more strips that constitute the second stripe At least one of 124 is a different drive 124. For each RAID group 123, the first stripe is any stripe, and the second stripe is any stripe other than the first stripe. Specifically, in this embodiment, a plurality of strips constituting the same stripe are distributed in different drive groups, and a plurality of drive positions (physical addresses in the drives) respectively corresponding to the plurality of strips. The position to follow is different. In FIG. 2, the same number is given to each of the four strips constituting the same stripe. According to the physical configuration, for example, four strips 0-0 constituting the same stripe are distributed in a plurality of drive groups 123.
 また、物理構成によれば、各ドライブ124は、複数のストリップの他に、1以上のスペア領域(S)を有する。本実施例では、各ドライブ124は、1個のスペア領域を有する。「スペア領域」とは、予備の記憶領域である。論理構成と物理構成の比較によれば、各ドライブ124について、いずれのストリップも、RAIDグループ223の構成要素となるが、スペア領域は、RAIDグループ223の構成要素にならない。障害ドライブが生じた場合、後述するように、障害ドライブ内のストリップにおけるデータ(データ要素又はパリティ)がスペア領域に復元される。各スペア領域のサイズは、ストリップサイズ以上である。 Further, according to the physical configuration, each drive 124 has one or more spare areas (S) in addition to a plurality of strips. In this embodiment, each drive 124 has one spare area. The “spare area” is a spare storage area. According to the comparison between the logical configuration and the physical configuration, for each drive 124, any strip is a component of the RAID group 223, but the spare area is not a component of the RAID group 223. When a failed drive occurs, data (data element or parity) in the strip in the failed drive is restored to the spare area, as will be described later. The size of each spare area is not less than the strip size.
 以上のように、本実施例では、分散RAID構成が採用されており、且つ、スペアドライブが設けられることに代えて、各ドライブ124にスペア領域が設けられている。なお、プール183によって、RAIDレベル及びRAID構成のうちの少なくとも1つは異なっていてもよい。本実施例では、説明を簡単にするために、いずれのプール183も、RAIDレベルはRAID6であり、RAID構成は2D+2Pであるとする。 As described above, in this embodiment, a distributed RAID configuration is adopted, and a spare area is provided in each drive 124 instead of providing a spare drive. Note that, depending on the pool 183, at least one of the RAID level and the RAID configuration may be different. In this embodiment, to simplify the description, it is assumed that the RAID level of any pool 183 is RAID 6 and the RAID configuration is 2D + 2P.
 図3は、ローカルメモリ115に格納されるプログラム及びテーブルの一例を示す。 FIG. 3 shows an example of programs and tables stored in the local memory 115.
 ローカルメモリ115は、各種プログラムを格納する。プログラムとして、例えば、ホストI/O処理プログラム301と、リビルド処理プログラム302と、コピーバック処理プログラム303と、パリティ生成プログラム304とがある。ホストI/O処理プログラム301は、ホスト101からのI/O要求を処理する。リビルド処理プログラム302は、リビルド処理を実行する。コピーバック処理プログラム303は、コピーバック処理を実行する。パリティ生成プログラム304は、ストライプに格納されるパリティを生成する。 The local memory 115 stores various programs. Examples of the program include a host I / O processing program 301, a rebuild processing program 302, a copyback processing program 303, and a parity generation program 304. The host I / O processing program 301 processes an I / O request from the host 101. The rebuild process program 302 executes a rebuild process. The copy back processing program 303 executes copy back processing. The parity generation program 304 generates the parity stored in the stripe.
 また、ローカルメモリ115は、各種テーブルを格納する。テーブルとして、例えば、プール状態テーブル305と、コピーバック設定管理テーブル306と、進捗管理テーブル307と、先頭アドレス管理テーブル308とがある。 The local memory 115 stores various tables. Examples of the table include a pool status table 305, a copyback setting management table 306, a progress management table 307, and a head address management table 308.
 図4は、プール状態テーブル305の一例と、そのテーブル305に対応したプールの状態とを示す。 FIG. 4 shows an example of the pool status table 305 and the status of the pool corresponding to the table 305.
 プール状態テーブル305は、プール183の冗長度及び状態を示す情報である。プール状態テーブル305は、プール183毎に、プール番号401と、冗長度402と、状態403といった情報を保持するエントリを管理する。 The pool state table 305 is information indicating the redundancy and state of the pool 183. The pool state table 305 manages entries that hold information such as the pool number 401, the redundancy 402, and the state 403 for each pool 183.
 プール番号401は、プールの番号である。冗長度402は、プール183の冗長度を示す。なお、冗長度402としての冗長度は、プール183のうちの最低の冗長度である。すなわち、プール183は分散RAID構成のため、N個の障害ドライブがあるからといって(Nは1以上の整数)、N個の障害ドライブが関わる複数のストライプにそれぞれ対応した複数の冗長度が同じであるとは限らない。冗長度「1」のストライプもあれば、冗長度「0」のストライプもあることがある。この場合、冗長度402としては、最低値の「0」が登録される。 Pool number 401 is a pool number. The redundancy 402 indicates the redundancy of the pool 183. The redundancy as the redundancy 402 is the lowest redundancy in the pool 183. In other words, because the pool 183 has a distributed RAID configuration, even if there are N failed drives (N is an integer of 1 or more), there are a plurality of redundancy levels respectively corresponding to a plurality of stripes related to the N failed drives. It is not always the same. Some stripes have a redundancy “1” and some stripes have a redundancy “0”. In this case, the minimum value “0” is registered as the redundancy 402.
 状態403は、プール183への処理に関する状態を示す。プール183への処理としては、例えば、コピーバック処理がある。状態403の値として、例えば、コピーバック処理中であり且つその処理のうちの後述のデータコピー処理中であることを意味する「コピーバック中」、コピーバック処理中であるがデータコピー処理を停止した状態であることを意味する「停止中」、及び、コピーバック処理中でないことを意味する「なし」がある。 The state 403 indicates a state related to processing to the pool 183. An example of processing for the pool 183 is copy back processing. As the value of the status 403, for example, “copyback is in progress”, which means that copyback processing is in progress and data copy processing described later is being processed, and data copy processing is stopped There is “stopped” meaning that it is in an inactive state, and “none” meaning that copyback processing is not in progress.
 図4のテーブル305によれば、プール0~2の各々の冗長度及び状態は、次の通りである。すなわち、プール0において、2個のドライブに障害が生じ、冗長度が2から0になっている。プール0について、コピーバック処理におけるデータコピー処理が行われている最中である。プール1において、1個のドライブに障害が生じ、冗長度が2から1になっている。プール1について、コピーバック処理が行われている最中であるが、その処理におけるデータコピー処理は、停止した状態である。プール2において、いずれのドライブにも障害が生じていない。 According to the table 305 in FIG. 4, the redundancy and status of each of the pools 0 to 2 are as follows. That is, in pool 0, two drives have failed and the redundancy is from 2 to 0. For the pool 0, the data copy process in the copy back process is being performed. In the pool 1, a failure occurs in one drive, and the redundancy is 2 to 1. Although the copy back process is being performed for the pool 1, the data copy process in the process is in a stopped state. In the pool 2, no failure has occurred in any drive.
 図5は、コピーバック設定管理テーブル306の一例を示す。 FIG. 5 shows an example of the copyback setting management table 306.
 コピーバック設定管理テーブル306は、データコピー処理を実行するか否かを判定する閾値を管理するテーブルである。コピーバック設定管理テーブル306は、プール183毎に、プール番号501と、I/O待ち時間502と、ライト割合503と、CPU使用率504と、判定待ち時間505といった情報を保持するエントリを管理する。 The copyback setting management table 306 is a table for managing thresholds for determining whether or not to execute data copy processing. The copyback setting management table 306 manages entries that hold information such as a pool number 501, an I / O waiting time 502, a write rate 503, a CPU usage rate 504, and a determination waiting time 505 for each pool 183. .
 プール番号501は、プール183の番号である。I/O待ち時間502は、プール183についてホスト101からI/O要求(以下、ホストI/O)が届いていない時間を示す。ライト割合503は、プール183についてのホストI/O中のライトの割合を示す。CPU使用率504は、プール183に対する処理に関してCPU113の使用率を示す。判定待ち時間505は、データコピー処理が開始又は停止してから判定までの待ち時間を示す。 Pool number 501 is the number of pool 183. The I / O waiting time 502 indicates a time during which an I / O request (hereinafter referred to as host I / O) has not arrived from the host 101 for the pool 183. The write ratio 503 indicates the ratio of writes in the host I / O for the pool 183. The CPU usage rate 504 indicates the usage rate of the CPU 113 with respect to processing for the pool 183. The determination waiting time 505 indicates a waiting time from the start or stop of the data copy process to the determination.
 なお、情報502~505は、プール183毎に用意されるが、全てのプール183に共通であってもよい。 The information 502 to 505 is prepared for each pool 183, but may be common to all the pools 183.
 図6は、進捗管理テーブル307の一例を示す。 FIG. 6 shows an example of the progress management table 307.
 進捗管理テーブル307は、交換後ドライブ(障害ドライブ124と交換されたドライブ)に対してのコピーバック処理の進捗を管理するテーブルである。進捗管理テーブル307は、交換後ドライブが有するストリップ毎に、アドレス601と、完了フラグ602と、データ位置603といった情報を保持するエントリを管理する。アドレス601は、ストリップのアドレス(番号)である。完了フラグ602は、ストリップに対してデータの復元が完了したか否かを示すフラグである。完了フラグ602の値として、完了のときは「1」、未完了のときは「0」が設定される。データ位置603は、ストリップに復元されるべきデータ(コピーバック処理対象のデータ)が存在する位置(ドライブ124の番号とそのドライブ124におけるスペア領域のアドレスとの組合せ)を示す。 The progress management table 307 is a table for managing the progress of the copy back process for the replaced drive (the drive replaced with the failed drive 124). The progress management table 307 manages entries that hold information such as an address 601, a completion flag 602, and a data position 603 for each strip included in the drive after replacement. An address 601 is a strip address (number). The completion flag 602 is a flag indicating whether or not the data restoration for the strip is completed. As the value of the completion flag 602, “1” is set when it is completed, and “0” is set when it is not completed. A data position 603 indicates a position (a combination of the number of the drive 124 and the address of the spare area in the drive 124) where the data (data to be copied back) to be restored exists in the strip.
 図7は、先頭アドレス管理テーブル308の一例を示す。 FIG. 7 shows an example of the head address management table 308.
 先頭アドレス管理テーブル308は、コピーバック処理が未完了の位置を管理するテーブルである。先頭アドレス管理テーブル308には、進捗管理テーブル307において完了フラグ602「0」に対応したアドレス601のうちの先頭のアドレスが格納される。 The head address management table 308 is a table for managing a position where the copy back process is not completed. The head address management table 308 stores the head address of the addresses 601 corresponding to the completion flag 602 “0” in the progress management table 307.
 以下、1つのプール183を例に取り、本実施例で行われる処理を説明する。なお、以下の説明では、そのプール183を、「対象プール183」と呼ぶ。また、本実施例の説明では、各用語の意味は、以下の通りとする。
・「問題ドライブ」は、問題の発生したドライブである。「問題」とは、障害、又は、障害の発生可能性が高いこと、である。本実施例では、問題ドライブは、障害ドライブである。「障害ドライブ」は、障害の発生したドライブである。「障害ストリップ」は、障害ドライブ内のストリップである。なお、問題ドライブの別の例として、障害候補ドライブ(障害の発生する可能性の高いドライブ)を採用することができる。障害候補ドライブ内のストリップを、「障害候補ストリップ」と呼ぶことができる。
・「復元」は、ドライブにおけるスペア領域に対する書込み、又は、交換後ドライブにおけるストリップに対する書込みを意味する際に、使用されることがある用語である。
・「交換後ドライブ」とは、問題ドライブと交換されたドライブである。
・「復旧処理」とは、リビルド処理とコピーバック処理とを含む処理を意味する用語である。
・「復旧」とは、リビルドとコピーバックとを含む用語である。
・「リビルド処理」とは、リビルドを含んだ処理である。「リビルド」とは、全ての障害ストリップに対応したデータをそれぞれ複数のスペア領域に復元すること、つまり、コレクションコピーのことである。なお、「障害ストリップに対応したデータ」とは、典型的には、障害ストリップ内のデータであるが、障害ストリップ内のデータの更新後データも該当してもよい。また、問題ドライブが障害候補ドライブの場合、リビルドは、全ての障害候補ストリップ内のデータをそれぞれ複数のスペア領域に復元すること、つまり、ダイナミックスペアリングのことである。
・「コピーバック処理」とは、コピーバックを含んだ処理である。「コピーバック」とは、後述のデータコピー処理に相当する処理であり、スペア領域から交換後ドライブ内のストリップにデータをコピーすることである。本実施例では、交換後ドライブ内のストリップには、コピーバックによりデータが復元されることもあれば、コピーバックに代えてホストI/Oに従ってデータが復元されることもある。
・「データXX」とは、ストリップXX内のデータ(データ要素又はパリティ)のことである(XXは番号)。
Hereinafter, the processing performed in this embodiment will be described by taking one pool 183 as an example. In the following description, the pool 183 is referred to as a “target pool 183”. In the description of this embodiment, the meaning of each term is as follows.
“Problem drive” is a drive in which a problem has occurred. “Problem” means a failure or a high possibility of occurrence of a failure. In this embodiment, the problem drive is a failed drive. A “failed drive” is a drive in which a failure has occurred. A “failure strip” is a strip in a failed drive. As another example of the problem drive, a failure candidate drive (a drive with a high possibility of failure) can be adopted. A strip in a failure candidate drive may be referred to as a “failure candidate strip”.
“Restoration” is a term that may be used to mean writing to a spare area in a drive or writing to a strip in a drive after replacement.
“Drive after replacement” is a drive that has been replaced with a problem drive.
“Recovery processing” is a term that means processing including rebuild processing and copy back processing.
“Recovery” is a term that includes rebuild and copyback.
“Rebuild process” is a process including a rebuild. “Rebuild” is to restore data corresponding to all fault strips to a plurality of spare areas, that is, collection copy. The “data corresponding to the failure strip” is typically data in the failure strip, but the data after updating the data in the failure strip may also correspond. When the problem drive is a failure candidate drive, rebuilding is to restore data in all failure candidate strips to a plurality of spare areas, that is, dynamic sparing.
“Copy back processing” is processing including copy back. “Copy back” is a process corresponding to a data copy process to be described later, and is to copy data from a spare area to a strip in a drive after replacement. In this embodiment, data may be restored to the strip in the drive after replacement by copy back, or data may be restored according to host I / O instead of copy back.
“Data XX” is data (data element or parity) in the strip XX (XX is a number).
 図8は、復旧処理の流れを示す。復旧処理は、例えば、ドライブ124に障害(故障)が発生したことが、例えばCPU113(例えばリビルド処理プログラム302)により検出された場合に開始される。 FIG. 8 shows the flow of recovery processing. The recovery process is started when, for example, the CPU 113 (for example, the rebuild process program 302) detects that a failure (failure) has occurred in the drive 124.
 リビルド処理プログラム302が、障害ドライブ124内のストリップにあるデータ(データ要素又はパリティ)と同一のデータを正常ディスク124のスペア領域に復元するリビルド処理を開始する(S801)。 The rebuild process program 302 starts the rebuild process for restoring the same data as the data (data element or parity) in the strip in the failed drive 124 to the spare area of the normal disk 124 (S801).
 図9は、リビルド処理の詳細の一例を示す。 FIG. 9 shows an example of the details of the rebuild process.
 対象プール183内のドライブ00に障害が発生したとする。障害ドライブ00には、障害ストリップ0-1、2-2、1-1及び2-3がある。 Suppose that a failure has occurred in the drive 00 in the target pool 183. The faulty drive 00 includes fault strips 0-1, 2-2, 1-1 and 2-3.
 リビルド処理プログラム302が、障害ドライブ00内のデータ0-1を、正常ドライブ05、06及び0B内のデータ0-1に基づいて、いずれかの正常ドライブ内のスペア領域、例えば、正常ドライブ01のスペア領域に復元する。リビルド処理プログラム302が、そのスペア領域の位置(ドライブ01の番号とスペア領域のアドレスとの組合せ)を、データ位置603として、対象プール183に対応した進捗管理テーブル307に追記する。 Based on the data 0-1 in the normal drives 05, 06 and 0B, the rebuild process program 302 changes the data 0-1 in the failed drive 00 to the spare area in any normal drive, for example, the normal drive 01. Restore to spare area. The rebuild processing program 302 adds the position of the spare area (a combination of the drive 01 number and the address of the spare area) to the progress management table 307 corresponding to the target pool 183 as the data position 603.
 同様に、リビルド処理プログラム302は、障害ドライブ00内の他のデータ2-2、1-1及び2-3を、それぞれ、例えば、正常ドライブ02~04におけるスペア領域に復元し、且つ、それらのスペア領域の位置をそれぞれデータ位置603として進捗管理テーブル307に追記する。 Similarly, the rebuild processing program 302 restores the other data 2-2, 1-1 and 2-3 in the failed drive 00 to, for example, spare areas in the normal drives 02 to 04, and The positions of the spare areas are added to the progress management table 307 as data positions 603, respectively.
 このように、本実施例では、障害ドライブ内の複数のデータが、複数の正常ドライブのスペア領域にそれぞれ復元される。つまり、データの書込み先が、スペアドライブのように1つのドライブではなく、複数のドライブに分散している。このため、リビルド処理にかかる時間を短縮できることが期待される。 Thus, in this embodiment, a plurality of data in the failed drive are restored to the spare areas of a plurality of normal drives, respectively. In other words, data write destinations are not distributed to a single drive like a spare drive, but are distributed to a plurality of drives. For this reason, it is expected that the time required for the rebuild process can be shortened.
 図8に戻る。リビルド処理の最中又は完了後に、例えば保守員により、障害ドライブ124が交換される(S802)。 Return to FIG. During or after the rebuild process, the failed drive 124 is replaced, for example, by maintenance personnel (S802).
 障害ドライブの交換が、例えばCPU113(例えばリビルド処理プログラム302)により検出された場合、リビルド処理プログラム302が、リビルド処理が完了しているか否かを判定する(S803)。リビルドが完了していない場合(S803:NO)には、リビルド処理プログラム302は、リビルド処理を継続する。一方、リビルド処理が完了している場合(S803:YES)には、リビルド処理プログラム302が、ドライブ交換完了サインを出力する(例えば、ストレージシステム102に設けられている所定のLED(Light Emitting Diode)を点灯する)(S804)。そのサインを見た者(例えば保守員)が、例えば、コピーバック処理の指示を、保守端末のような入出力インターフェース経由で、ストレージシステムに出す。 When replacement of the failed drive is detected by, for example, the CPU 113 (for example, the rebuild process program 302), the rebuild process program 302 determines whether the rebuild process has been completed (S803). If the rebuild is not completed (S803: NO), the rebuild process program 302 continues the rebuild process. On the other hand, if the rebuild process has been completed (S803: YES), the rebuild process program 302 outputs a drive replacement completion sign (for example, a predetermined LED (Light Emitting Diode) provided in the storage system 102). (S804). A person who sees the sign (for example, a maintenance staff) issues a copyback processing instruction to the storage system via an input / output interface such as a maintenance terminal.
 次いで、コピーバック処理プログラム303が、例えばコピーバック処理の指示に応答して(又はドライブ交換完了の検出に応答して)、コピーバック処理を実行する(S805)。 Next, the copyback processing program 303 executes the copyback processing in response to, for example, an instruction for copyback processing (or in response to detection of the completion of drive replacement) (S805).
 図10は、コピーバック処理の詳細の一例を示す。図10は、図8に示したリビルド処理の続きに相当する。 FIG. 10 shows an example of details of the copyback process. FIG. 10 corresponds to the continuation of the rebuild process shown in FIG.
 障害ドライブ00が交換され、且つ、リビルド処理が完了している場合に、コピーバック処理が開始される。コピーバック処理では、コピーバック処理プログラム303が、ドライブ01~04のスペア領域にあるデータ0-1、2-2、1-1及び2-3を、それぞれ、交換ドライブ00のストリップ0-1、2-2、1-1及び2-3にそれぞれ復元(コピー)する。 When the failed drive 00 is replaced and the rebuild process is completed, the copy back process is started. In the copy back processing, the copy back processing program 303 converts the data 0-1, 2-2, 1-1, and 2-3 in the spare area of the drives 01 to 04 to the strip 0-1, the replacement drive 00, respectively. Restore (copy) to 2-2, 1-1, and 2-3, respectively.
 図11は、コピーバック処理の詳細の流れを示す。 FIG. 11 shows the detailed flow of the copyback process.
 コピーバック処理プログラム303が、対象プール183に対応した冗長度402を参照し、冗長度402が「0」であるか否かを判定する(S1101)。S1101の判定結果が真の場合(S1101:YES)、コピーバック処理プログラム303が、データコピー処理を実行する(S1105)。対象プール183のデータ保護の信頼性を高めるためである。なお、冗長度「0」は、冗長度の閾値の一例である。閾値は、0より大きくてもよい。また、S1105の実行の際、対象プール183に対応した状態403が「コピーバック中」でなければ、コピーバック処理プログラム303は、その状態403を「コピーバック中」に更新する。 The copyback processing program 303 refers to the redundancy 402 corresponding to the target pool 183 and determines whether the redundancy 402 is “0” (S1101). If the determination result in S1101 is true (S1101: YES), the copyback processing program 303 executes data copy processing (S1105). This is to increase the reliability of data protection of the target pool 183. The redundancy “0” is an example of a redundancy threshold. The threshold may be greater than zero. When the state 403 corresponding to the target pool 183 is not “copying back” when executing S1105, the copyback processing program 303 updates the state 403 to “copying back”.
 S1101の判定結果が偽の場合(S1101:NO)、コピーバック処理プログラム303が、対象プール183に対してホストI/Oが行われた最終時刻からの経過時間が、対象プール183に対応したI/O待ち時間502以上か否かを判定する(S1102)。S1102の判定結果が真の場合(S1102:YES)、コピーバック処理プログラム303が、データコピー処理を実行する(S1105)。ホストI/Oが行われていないが故にCPU113の負荷が比較的低く、その分、データコピー処理に使用することが効率的であると考えられるためである。 If the determination result of S1101 is false (S1101: NO), the elapsed time from the last time when the copyback processing program 303 performed host I / O on the target pool 183 is the I corresponding to the target pool 183. It is determined whether or not the / O waiting time 502 is over (S1102). If the determination result in S1102 is true (S1102: YES), the copyback processing program 303 executes data copy processing (S1105). This is because since the host I / O is not performed, the load on the CPU 113 is relatively low, and it is considered that it is efficient to use it for the data copy processing.
 S1102の判定結果が偽の場合(S1102:NO)、コピーバック処理プログラム303が、対象プール183に対するホストI/O中のライトの割合が、対象プール183に対応したライト割合503未満であるか否かを判定する(S1103)。S1103の判定結果が真の場合(S1103:YES)、コピーバック処理プログラム303が、データコピー処理を実行する(S1105)。ライト割合が低ければ、コピーバック処理中の後述のライト処理において、ドライブ間のデータコピー無しに交換後ドライブ内のストリップにデータが復元されている可能性が低いためである。 If the determination result in S1102 is false (S1102: NO), the copyback processing program 303 determines whether the write ratio in the host I / O to the target pool 183 is less than the write ratio 503 corresponding to the target pool 183. Is determined (S1103). If the determination result in S1103 is true (S1103: YES), the copyback processing program 303 executes data copy processing (S1105). This is because if the write ratio is low, there is a low possibility that data is restored to the strip in the drive after the replacement without data copy between the drives in the later-described write processing during the copy back processing.
 S1103の判定結果が偽の場合(S1103:NO)、コピーバック処理プログラム303が、対象プール183に関するCPU使用率が、対象プール183に対応したCPU使用率504未満であるか否かを判定する(S1104)。S1104の判定結果が真の場合(S1104:YES)、コピーバック処理プログラム303が、データコピー処理を実行する(S1105)。CPU113の負荷が比較的低く、その分、データコピー処理に使用することが効率的であると考えられるためである。 When the determination result in S1103 is false (S1103: NO), the copyback processing program 303 determines whether or not the CPU usage rate related to the target pool 183 is less than the CPU usage rate 504 corresponding to the target pool 183 ( S1104). If the determination result in S1104 is true (S1104: YES), the copyback processing program 303 executes data copy processing (S1105). This is because the load on the CPU 113 is relatively low, and it is considered that it is more efficient to use it for data copy processing.
 S1104の判定結果が偽の場合(S1104:NO)、コピーバック処理プログラム303が、データコピー処理を停止する(S1106)。このとき、対象プール183に対応した状態403が「停止中」でなければ、コピーバック処理プログラム303は、その状態403を「停止中」に更新する。また、このとき、データコピー処理が既に停止の場合は(状態403が既に「停止中」の場合は)、S1106がスキップされてよい(すなわち、データコピー処理は停止したままとなる)。 If the determination result in S1104 is false (S1104: NO), the copyback processing program 303 stops the data copy process (S1106). At this time, if the state 403 corresponding to the target pool 183 is not “stopped”, the copyback processing program 303 updates the state 403 to “stopped”. At this time, if the data copy process is already stopped (if the state 403 is already “stopped”), S1106 may be skipped (that is, the data copy process remains stopped).
 次いで、コピーバック処理プログラム303が、データコピー処理の停止時刻からの経過時間が、対象プール183に対応した判定待ち時間505以上か否かを判定する(S1107)。 Next, the copy back processing program 303 determines whether the elapsed time from the data copy processing stop time is equal to or longer than the determination waiting time 505 corresponding to the target pool 183 (S1107).
 S1107の判定結果が真の場合(つまり、データコピー処理の停止から判定待ち時間505以上待った場合)(S1107:YES)、コピーバック処理プログラム303が、コピーバック処理が完了したか否かを判定する(S1108)。S1108の判定は、対象プール183に対応した進捗管理テーブル307において、交換後ドライブ内の全てのストリップに対応した完了フラグ602が「1」であるか否かの判定である。S1108の判定結果が偽の場合(S1108:NO)、ステップ1101に戻る。一方、S1108の判定結果が真の場合(S1108:YES)、コピーバック処理が終了となる。 If the determination result in S1107 is true (that is, if the determination wait time 505 or more has been waited after the data copy process is stopped) (S1107: YES), the copyback processing program 303 determines whether the copyback process has been completed. (S1108). The determination in S1108 is a determination as to whether or not the completion flag 602 corresponding to all the strips in the post-replacement drive is “1” in the progress management table 307 corresponding to the target pool 183. If the determination result in S1108 is false (S1108: NO), the process returns to Step 1101. On the other hand, if the determination result in S1108 is true (S1108: YES), the copyback process is terminated.
 S1101、S1102、S1103及びS1104は、判定の優先度が高い順である。異なる順序で判定が行われてよい。また、S1101~S1104のうちの少なくとも1つの判定結果が真の場合にデータコピー処理(S1105)が行われてもよいし、S1101~S1104のうち少なくとも1つの判定結果が偽の場合にデータコピー処理(S1105)が停止されてもよい。例えば、対象プール183の冗長度402が閾値(例えば「0」)より大きければ(S1101:NO)、他の判定の結果に関わらず、データコピー処理が停止されてもよい。また、例えば、対象プール183について、最終ホストI/Oからの経過時間がI/O待ち時間未満であれば(S1102:NO)、他の判定の結果に関わらず、データコピー処理が停止されてもよい。また、例えば、ライト割合がライト割合503以上(S1103:NO)であれば、他の判定の結果に関わらず、データコピー処理が停止されてもよい。 S1101, S1102, S1103, and S1104 are in descending order of priority. The determination may be made in a different order. Further, the data copy process (S1105) may be performed when at least one determination result of S1101 to S1104 is true, or the data copy process when at least one determination result of S1101 to S1104 is false. (S1105) may be stopped. For example, if the redundancy 402 of the target pool 183 is larger than a threshold (eg, “0”) (S1101: NO), the data copy process may be stopped regardless of the result of other determinations. For example, if the elapsed time from the last host I / O is less than the I / O waiting time for the target pool 183 (S1102: NO), the data copy process is stopped regardless of the result of other determinations. Also good. For example, if the write ratio is equal to or higher than the write ratio 503 (S1103: NO), the data copy process may be stopped regardless of the result of other determinations.
 図12は、データコピー処理の詳細の流れを示す。 FIG. 12 shows the detailed flow of the data copy process.
 コピーバック処理プログラム303が、先頭アドレス管理テーブル308からデータコピーが未完了の先頭アドレスを特定する(S1201)。 The copyback processing program 303 identifies the head address for which data copying has not been completed from the head address management table 308 (S1201).
 次いで、コピーバック処理プログラム303が、S1201で特定したアドレスと一致するアドレス601に対応したデータ位置603を特定する(S1202)。 Next, the copyback processing program 303 identifies the data position 603 corresponding to the address 601 that matches the address identified in S1201 (S1202).
 次いで、コピーバック処理プログラム303が、S1202で特定したデータ位置603に従うスペア領域から、コピーバック処理対象データを、S1201で特定した先頭アドレスに従うストリップ(交換後ドライブ内のストリップ)にコピーする(S1203)。 Next, the copyback processing program 303 copies the copyback processing target data from the spare area according to the data position 603 specified in S1202 to the strip (strip in the drive after replacement) specified in S1201 (S1203). .
 次いで、コピーバック処理プログラム303が、そのコピー先のストリップに対応した完了フラグ602を「1」(完了)に更新し、且つ、先頭アドレス管理テーブル308における先頭アドレスを、完了フラグ602「0」に対応したアドレス601のうちの先頭のアドレスに更新する(S1204)。また、S1204において、対象プール183における複数のストライプにそれぞれ対応した複数の冗長度の最低値が高くなった場合、コピーバック処理プログラム303は、対象プール183に対応した冗長度402も更新する。 Next, the copyback processing program 303 updates the completion flag 602 corresponding to the copy destination strip to “1” (completed), and sets the head address in the head address management table 308 to the completion flag 602 “0”. Update to the head address of the corresponding address 601 (S1204). In S1204, when the minimum value of the plurality of redundancy levels corresponding to the plurality of stripes in the target pool 183 becomes high, the copyback processing program 303 also updates the redundancy level 402 corresponding to the target pool 183.
 次いで、コピーバック処理プログラム303が、コピーバック処理が完了しているか否かを判定する(S1205)。この判定は、図11のS1108の判定と同じである。S1205の判定結果が真の場合(S1205:YES)、コピーバック処理が終了する。 Next, the copy back processing program 303 determines whether or not the copy back processing is completed (S1205). This determination is the same as the determination in S1108 of FIG. If the determination result in S1205 is true (S1205: YES), the copyback process ends.
 一方、S1205の判定結果が偽の場合(S1205:NO)、コピーバック処理プログラム303が、データコピー処理の開始時刻からの経過時間が、対象プール183に対応した判定待ち時間505以上か否かを判定する(S1206)。S1206の判定結果が偽の場合(S1206:NO)、処理がS1201に戻る(つまり、データコピー処理が継続する)。一方、S1206の判定結果が真の場合(S1206:YES)、処理がS1101に戻る(このとき、データコピー処理が一旦終了(停止)してもよい)。 On the other hand, if the determination result in S1205 is false (S1205: NO), the copy back processing program 303 determines whether or not the elapsed time from the start time of the data copy process is equal to or greater than the determination waiting time 505 corresponding to the target pool 183. Determination is made (S1206). If the determination result in S1206 is false (S1206: NO), the process returns to S1201 (that is, the data copy process continues). On the other hand, if the determination result in S1206 is true (S1206: YES), the process returns to S1101 (at this time, the data copy process may be temporarily ended (stopped)).
 図13は、コピーバック処理中のライト処理の一例を示す。 FIG. 13 shows an example of write processing during copy back processing.
 コピーバック処理中にライト要求をホスト101からホストI/O処理プログラム301が受信した場合、ホストI/O処理プログラム301は、そのライト要求に従うデータを、交換後ドライブ内のストリップに書き込む。結果として、交換後ドライブ内のストリップにデータが復元されたことになる。詳細は、例えば以下の通りである。 When the host I / O processing program 301 receives a write request from the host 101 during the copy back processing, the host I / O processing program 301 writes the data according to the write request to the strip in the drive after replacement. As a result, the data is restored to the strip in the drive after replacement. Details are as follows, for example.
 コピーバック処理中に、交換後ドライブ00内のストリップ2-2をライト先としたライト要求をホストI/O処理プログラム301が受信した場合(S14-1)、ホストI/O処理プログラム301は、正常ドライブ01、06及び0Bの各々のストリップ2-2からデータ2-2を読み込む(S14-2)。次いで、ホストI/O処理プログラム301は、読み込んだ複数のデータ2-2と、ホスト101からの更新後データ2-2とからパリティを計算する(S14-3)。次いで、ホストI/O処理プログラム301は、更新後データ2-2を、交換後ドライブ00内のストリップ2-2へ書き込む(S14-4)。ホストI/O処理プログラム301は、書込み完了をホスト101へ応答する(S14-5)。また、ホストI/O処理プログラム301は、交換後ドライブ00内のライト先ストリップ2-2に対応した完了フラグ602が「0」であれば「1」に更新する(S14-6)。このように、コピーバック処理中にライト要求を受けた場合、そのライト要求の処理に便乗して、交換後ドライブ内のストリップにデータを復元することができる。 When the host I / O processing program 301 receives a write request with the strip 2-2 in the drive 00 after replacement as the write destination during the copy back processing (S14-1), the host I / O processing program 301 Data 2-2 is read from each strip 2-2 of the normal drives 01, 06 and 0B (S14-2). Next, the host I / O processing program 301 calculates a parity from the plurality of read data 2-2 and the updated data 2-2 from the host 101 (S14-3). Next, the host I / O processing program 301 writes the post-update data 2-2 to the strip 2-2 in the post-replacement drive 00 (S14-4). The host I / O processing program 301 returns a write completion response to the host 101 (S14-5). Further, if the completion flag 602 corresponding to the write destination strip 2-2 in the drive 00 after replacement is “0”, the host I / O processing program 301 updates it to “1” (S14-6). As described above, when a write request is received during the copy back process, the data can be restored to the strip in the drive after replacement by taking advantage of the process of the write request.
 図14は、コピーバック処理の進捗とライト処理とに応じた進捗管理テーブル307の遷移の一例を示す。図14では、進捗管理テーブル307は、模式的に(ビットマップとして)表現されている。アドレスの並びは、破線矢印の通りであるとする。 FIG. 14 shows an example of the transition of the progress management table 307 according to the progress of the copyback process and the write process. In FIG. 14, the progress management table 307 is schematically expressed (as a bitmap). It is assumed that the addresses are arranged as indicated by broken arrows.
 図14の左側のテーブル307によれば、コピーバック処理によって、1番目~8番目のストリップまでデータが復元されている(完了フラグ602は「1」となっている)。 According to the table 307 on the left side of FIG. 14, data is restored from the first to eighth strips by the copy back process (the completion flag 602 is “1”).
 その後、コピーバック処理中のライト処理の結果として、16番目及び20番目のストリップにデータが復元されたとする。この場合、図14の中央のテーブル307が示すように、コピー未完了の先頭アドレスは、9番目のストリップを指しているものの、16番目及び20番目のストリップに対応した完了フラグ602は「1」となる。 Then, it is assumed that data is restored to the 16th and 20th strips as a result of the write process during the copyback process. In this case, as shown in the center table 307 of FIG. 14, the incomplete copy start address indicates the ninth strip, but the completion flag 602 corresponding to the 16th and 20th strips is “1”. It becomes.
 その後、コピーバック処理が進むと、図14の右側のテーブル307が示すように、既に復元済みとなっている(完了フラグ602が「1」となっている)16番目のストリップのアドレスは、コピー未完了の先頭アドレスとされることからスキップされる。つまり、コピーバック処理において、ライト処理に便乗してデータが復元されたストリップについては、データコピー処理はスキップされる。このため、コピーバック処理の負荷が軽減する。 Thereafter, when the copy back process proceeds, as shown in the table 307 on the right side of FIG. 14, the address of the 16th strip that has already been restored (the completion flag 602 is “1”) It is skipped because it is an unfinished start address. That is, in the copy back process, the data copy process is skipped for the strip on which the data is restored by taking advantage of the write process. This reduces the load on the copyback process.
 本実施例を、下記のように総括することができる。なお、下記の総括では、上述の説明に無い事項が含まれていてもよいし、逆に上述の説明に存在する事項が含まれていなくてもよい。 This example can be summarized as follows. In the following summary, matters not included in the above description may be included, and conversely, items existing in the above description may not be included.
 分散RAID構成が採用される。スペアドライブが設けられることに代えて、各ドライブ124にスペア領域が設けられている。障害ドライブ内の複数のデータが、複数の正常ドライブのスペア領域にそれぞれ復元される。つまり、データの書込み先が、スペアドライブのように1つのドライブではなく、複数のドライブに分散している。このため、リビルド処理にかかる時間を短縮することができる。結果として、冗長度が低下している時間を短縮でき、且つ、I/O処理性能の低下を軽減することができる。 A distributed RAID configuration is adopted. Instead of providing a spare drive, each drive 124 is provided with a spare area. A plurality of data in the failed drive are restored to spare areas of a plurality of normal drives, respectively. In other words, data write destinations are not distributed to a single drive like a spare drive, but are distributed to a plurality of drives. For this reason, the time required for the rebuild process can be shortened. As a result, the time during which the redundancy is reduced can be shortened, and the decrease in I / O processing performance can be reduced.
 一比較例によれば、スペアドライブが採用され、リビルド処理での復元先がスペアドライブに集約される、つまり、障害ドライブ内の全てのデータがスペアドライブに復元されることになる。この場合、そのスペアドライブが障害ドライブに代わってグループのメンバーとなれば、ドライブ間のコピー無しにコピーバック処理が完了したように見せることができるコピーバックレスが期待できる。 According to one comparative example, a spare drive is adopted, and the restoration destination in the rebuild process is aggregated into the spare drive, that is, all data in the failed drive is restored to the spare drive. In this case, if the spare drive becomes a member of the group in place of the failed drive, it is possible to expect copy backlessness that can make it appear that the copy back processing is completed without copying between the drives.
 しかし、本実施例では、上述した分散RAID構成が採用されているため、コピーバックレスを実現することができない。 However, in this embodiment, since the above-described distributed RAID configuration is adopted, copy backless cannot be realized.
 そこで、本実施例では、ホストI/O処理プログラム301が、コピーバック処理中のライト処理(ホスト101からのライト要求に従う処理)において、交換後ドライブ内のストリップに、ライト要求従う更新後データを復元する。結果として、ライト処理に便乗して、交換後ドライブ内のストリップにデータが復元される。コピーバック処理では、その復元済みストリップについては、データコピー処理がスキップされる。このため、コピーバック処理の負荷が軽減する。結果として、I/O処理性能の低下を軽減することができる。 Therefore, in this embodiment, the host I / O processing program 301 sends updated data according to the write request to the strip in the drive after replacement in the write processing (processing according to the write request from the host 101) during copy back processing. Restore. As a result, piggybacking on the light process restores the data to the strip in the drive after replacement. In the copy back process, the data copy process is skipped for the restored strip. This reduces the load on the copyback process. As a result, a decrease in I / O processing performance can be reduced.
 また、本実施例では、冗長度、ライト割合、及びCPU使用率に閾値が設けられ、コピーバック処理中の状況とそれらの閾値との比較の結果に応じて、データコピー処理の実行と停止が制御される。例えば、CPU113の負荷が低い(具体例として、ホストI/Oを一定時間受信していない、又は、CPU使用率が低い)場合、データコピー処理が実行される。また、ライト割合が高い場合、データコピー処理が停止される。コピーバック処理におけるデータコピー処理の実行と停止を、きめ細やかに制御することによって、ホストI/O処理性能の低下を抑えつつ、ドライブを復旧することができる。 In this embodiment, thresholds are provided for the redundancy, the write ratio, and the CPU usage rate, and the execution and stop of the data copy processing are performed according to the comparison between the status during the copy back processing and those threshold values. Be controlled. For example, when the load on the CPU 113 is low (specifically, the host I / O has not been received for a certain period of time or the CPU usage rate is low), the data copy process is executed. If the write ratio is high, the data copy process is stopped. By finely controlling the execution and stop of the data copy process in the copy back process, it is possible to restore the drive while suppressing a decrease in host I / O processing performance.
 実施例2を説明する。その際、実施例1との相違点を主に説明し、実施例1との共通点については説明を省略又は簡略する。 Example 2 will be described. At that time, differences from the first embodiment will be mainly described, and description of common points with the first embodiment will be omitted or simplified.
 図15は、実施例2に係るコピーバック処理中のリード処理の一例を示す。 FIG. 15 illustrates an example of read processing during copy back processing according to the second embodiment.
 実施例2では、コピーバック処理中にリード要求をホスト101からホストI/O処理プログラム301が受信した場合、ホストI/O処理プログラム301は、そのリード要求に従うデータを、そのデータを格納しているスペア領域から読み込んでホスト101に返すと共に、そのデータを、交換後ドライブ内のストリップに書き込む。そのストリップに書き込まれたデータは、コピーバック処理対象のデータであり、結果として、交換後ドライブ内のストリップにデータが復元されたことになる。詳細は、例えば以下の通りである。 In the second embodiment, when the host I / O processing program 301 receives a read request from the host 101 during the copyback process, the host I / O processing program 301 stores the data according to the read request. The spare area is read and returned to the host 101, and the data is written to the strip in the drive after replacement. The data written on the strip is data to be copied back, and as a result, the data is restored to the strip in the drive after the replacement. Details are as follows, for example.
 コピーバック処理中に、例えば、交換後ドライブ00内のストリップ2-2をリード元としたリード要求をホストI/O処理プログラム301が受信する(S13-1)。ホストI/O処理プログラム301は、そのストリップ2-2のアドレス601に対応した完了フラグ602が「0」の場合、そのアドレス601に対応したデータ位置603に従うスペア領域(図15の例では、正常ドライブ02のスペア領域)からデータ2-2を読み込む(S13-2)。次いで、ホストI/O処理プログラム301は、読み込まれたデータ2-2をホスト101に返し(S13-3)、そのデータ2-2を、交換後ドライブ内のストリップ2-2へ書き込む。(S13-4)。ホストI/O処理プログラム301は、そのストリップ2-2に対応した完了フラグ602を「1」に更新する(S13-5)。 During the copy back process, for example, the host I / O processing program 301 receives a read request with the strip 2-2 in the drive 00 after replacement as the read source (S13-1). When the completion flag 602 corresponding to the address 601 of the strip 2-2 is “0”, the host I / O processing program 301 indicates a spare area according to the data position 603 corresponding to the address 601 (normal in the example of FIG. 15). Data 2-2 is read from the spare area of the drive 02 (S13-2). Next, the host I / O processing program 301 returns the read data 2-2 to the host 101 (S13-3), and writes the data 2-2 to the strip 2-2 in the drive after replacement. (S13-4). The host I / O processing program 301 updates the completion flag 602 corresponding to the strip 2-2 to “1” (S13-5).
 本実施例によれば、コピーバック処理中のリード処理に便乗して、交換後ドライブ内のストリップにコピーバック処理対象データを復元することができる。なお、そのストリップについても、データコピー処理はスキップされる。 According to this embodiment, it is possible to restore the copy-back process target data to the strip in the drive after the replacement by taking advantage of the read process during the copy-back process. Note that the data copy process is also skipped for the strip.
 なお、本実施例では、ライト割合503という閾値に代えて又は加えて、ホストI/O頻度の閾値が採用されてもよい。コピーバック処理プログラム303は、S1103の判定に代えて又は加えて、対象プール183に関するホストI/O頻度がその閾値未満か否かの判定(以下、判定A)を実行してよい。判定Aの結果が真の場合、コピーバック処理プログラム303は、データコピー処理を実行してよい。コピーバック処理中のホストI/O処理(ライト処理又はリード処理)に便乗してデータが交換後ドライブ内のストリップに復元されている可能性が低いためである。 In this embodiment, a host I / O frequency threshold value may be employed instead of or in addition to the write ratio 503 threshold value. The copyback processing program 303 may execute a determination (hereinafter referred to as determination A) as to whether or not the host I / O frequency related to the target pool 183 is less than the threshold instead of or in addition to the determination in S1103. If the result of determination A is true, the copyback processing program 303 may execute data copy processing. This is because there is a low possibility that data is restored to the strip in the drive after the exchange by taking advantage of the host I / O process (write process or read process) during the copy back process.
 また、本実施例では、ライト割合503及びホストI/O頻度閾値の少なくとも1つに代えて又は加えて、リード割合の閾値が採用されてもよい。コピーバック処理プログラム303は、S1103の判定と、上記の判定Aとのうちの少なくとも1つに代えて又は加えて、対象プール183のリード割合(対象プール183に関するホストI/Oに対するリード要求の割合)がリード割合(閾値)未満か否かの判定(以下、判定B)を実行してよい。判定Bの結果が真の場合、コピーバック処理プログラム303は、データコピー処理を実行してよい。コピーバック処理中のホストI/O処理(ライト処理又はリード処理)に便乗してデータが交換後ドライブ内のストリップに復元されている可能性が低いためである。 In this embodiment, a read ratio threshold value may be used instead of or in addition to at least one of the write ratio 503 and the host I / O frequency threshold value. The copyback processing program 303 replaces or in addition to at least one of the determination in S1103 and the above determination A with the read ratio of the target pool 183 (the ratio of the read request to the host I / O related to the target pool 183) ) May be less than the lead ratio (threshold value) (hereinafter referred to as determination B). If the result of determination B is true, the copyback processing program 303 may execute data copy processing. This is because there is a low possibility that data is restored to the strip in the drive after the exchange by taking advantage of the host I / O process (write process or read process) during the copy back process.
 上述したS1101~S1104の判定、判定A及び判定Bの少なくとも1つが(例えば、少なくとも、S1103の判定、判定A及び判定Bのうちの少なくとも1つと、S1101の判定と、S1102又はS1104の判定とが)、データコピー処理の実行条件を満たすか否かの判定に含まれる。コピーバック処理プログラム303は、定期的に又は不定期的に、データコピー処理の実行条件を満たすか否かの判定を実行してよい。判定結果が真の場合、コピーバック処理プログラム303は、データコピー処理を実行(継続を含んでもよい)することができる。判定結果が偽の場合、コピーバック処理プログラム303は、データコピー処理を停止することができる。
 以上、幾つかの実施例を説明したが、これらは本発明の説明のための例示であって、本発明の範囲をこの実施例にのみ限定する趣旨ではない。本発明は、他の種々の形態でも実行することが可能である。
At least one of the determinations of S1101 to S1104, determination A, and determination B described above (for example, at least one of determination of S1103, determination A and determination B, determination of S1101, and determination of S1102 or S1104) ) Is included in the determination as to whether or not the execution condition of the data copy process is satisfied. The copyback processing program 303 may determine whether or not the execution condition of the data copy process is satisfied regularly or irregularly. If the determination result is true, the copyback processing program 303 can execute the data copy processing (may include continuation). If the determination result is false, the copyback processing program 303 can stop the data copy processing.
Although several embodiments have been described above, these are examples for explaining the present invention, and are not intended to limit the scope of the present invention only to this embodiment. The present invention can be implemented in various other forms.
101…ホスト、102…ストレージシステム 101 ... Host, 102 ... Storage system

Claims (13)

  1.  論理的な1以上のRAIDグループを提供する複数の記憶デバイスと、
     前記複数の記憶デバイスに接続された1以上のプロセッサであるプロセッサ部と
    を有し、
     各RAIDグループは、2以上のストライプで構成されており、
     各ストライプは、2以上のストリップで構成されており、
     前記複数の記憶デバイスの各々は、複数のストリップと、1以上のスペア領域とを有し、
     前記各RAIDグループについて、いずれかのストライプを構成する2以上のストリップをそれぞれ提供する2以上の記憶デバイスの少なくとも1つと、別のいずれかのストライプを構成する2以上のストリップをそれぞれ提供する2以上の記憶デバイスの少なくとも1つが、異なる記憶デバイスであり、
     前記プロセッサ部は、I/O要求を受けた場合、そのI/O要求に従うI/O処理を実行するようになっており
     前記プロセッサ部は、
      (A)前記複数の記憶デバイスのうちの問題記憶デバイスの複数のストリップに対応したデータを2以上の記憶デバイスの複数のスペア領域にそれぞれ復元するリビルド処理を実行し、
      (B)前記リビルド処理の完了後、前記問題記憶デバイスの交換後記憶デバイスが有する複数のストリップのうちデータ未復元のストリップに対応したスペア領域からデータをそのストリップにコピーするデータコピー処理を含んだコピーバック処理を実行し、
     前記プロセッサ部は、以下の(W)及び(R)のうちの少なくとも1つを実行し、
      (W)前記交換後記憶デバイス内のストリップをライト先としたライト要求を前記コピーバック処理中に受けた場合、そのライト要求に従うライト処理において、そのライト要求に従うデータを、そのライト先のストリップに書き込むこと、
      (R)前記交換後記憶デバイス内のストリップをリード元としたリード要求を前記コピーバック処理中に受けた場合、そのリード要求に従うリード処理において、そのリード元に対応したスペア領域からデータを読み込み、その読み込んだデータを、そのリード元のストリップに書き込むこと、
    前記プロセッサ部は、前記交換後記憶デバイスが有する複数のストリップのうち、前記コピーバック処理中のライト処理及びリード処理のいずれかにおいてデータが既に書き込まれているストリップについては、前記データコピー処理をスキップする、
    ストレージシステム。
    A plurality of storage devices providing one or more logical RAID groups;
    A processor unit that is one or more processors connected to the plurality of storage devices;
    Each RAID group is composed of two or more stripes,
    Each stripe consists of two or more strips,
    Each of the plurality of storage devices has a plurality of strips and one or more spare areas;
    For each of the RAID groups, at least one of two or more storage devices that respectively provide two or more strips that constitute one of the stripes, and two or more that respectively provide two or more strips that constitute one of the other stripes At least one of the storage devices is a different storage device;
    When the processor unit receives an I / O request, the processor unit executes an I / O process according to the I / O request.
    (A) executing a rebuild process for restoring data corresponding to a plurality of strips of a problem storage device among the plurality of storage devices to a plurality of spare areas of two or more storage devices;
    (B) after completion of the rebuilding process, including a data copy process for copying data to a strip from a spare area corresponding to a data-unrestored strip among a plurality of strips of the storage device after replacement of the problem storage device Execute copyback processing,
    The processor unit executes at least one of the following (W) and (R):
    (W) When a write request with the strip in the storage device after the exchange as a write destination is received during the copyback process, in the write process according to the write request, the data according to the write request is transferred to the write destination strip. Writing,
    (R) When a read request with the strip in the storage device after replacement as a read source is received during the copyback process, in the read process according to the read request, data is read from a spare area corresponding to the read source, Write the read data to the strip from which it was read,
    The processor unit skips the data copy process for a strip in which data has already been written in either the write process or the read process during the copy back process among the plurality of strips of the storage device after replacement. To
    Storage system.
  2.  前記プロセッサ部は、定期的に又は不定期的に、(B)において、
      (b1)前記データコピー処理の実行条件を満たすか否かを判定し、
      (b2)(b1)の判定結果が真の場合、前記データコピー処理を実行する、
    請求項1記載のストレージシステム。
    The processor unit periodically or irregularly in (B),
    (B1) It is determined whether or not an execution condition of the data copy process is satisfied,
    (B2) If the determination result of (b1) is true, the data copy process is executed.
    The storage system according to claim 1.
  3.  前記プロセッサ部は、(W)を実行するようになっており、
     (b1)の判定は、前記コピーバック処理中のプールに関するI/O要求に対するライト要求の割合が所定割合未満か否かの判定を含む、
    請求項2記載のストレージシステム。
    The processor unit is adapted to execute (W),
    The determination in (b1) includes a determination as to whether or not the ratio of write requests to I / O requests related to the pool during the copyback process is less than a predetermined ratio.
    The storage system according to claim 2.
  4.  前記プロセッサ部は、(R)を実行するようになっており、
     (b1)の判定は、前記コピーバック処理中のプールに関するI/O要求に対するリード要求の割合が所定割合未満か否かの判定を含む、
    請求項2記載のストレージシステム。
    The processor unit is adapted to execute (R),
    The determination of (b1) includes a determination of whether or not the ratio of the read request to the I / O request regarding the pool during the copyback process is less than a predetermined ratio.
    The storage system according to claim 2.
  5.  前記プロセッサ部は、(W)及び(R)のいずれも実行するようになっており、
     (b1)の判定は、前記コピーバック処理中のプールに関するI/O要求の頻度が所定頻度未満か否かの判定を含む、
    請求項2記載のストレージシステム。
    The processor unit is configured to execute both (W) and (R).
    The determination of (b1) includes a determination of whether or not the frequency of I / O requests regarding the pool during the copyback process is less than a predetermined frequency.
    The storage system according to claim 2.
  6.  (b1)の判定は、前記コピーバック処理中のプールにおける複数のストライプにそれぞれ対応した複数の冗長度の最低値が所定値か否かの判定を含む、
    請求項2記載のストレージシステム。
    The determination of (b1) includes a determination as to whether or not the minimum values of the plurality of redundancy levels respectively corresponding to the plurality of stripes in the pool during the copyback process are predetermined values.
    The storage system according to claim 2.
  7.  (b1)の判定は、前記コピーバック処理中のプールに関するI/O要求の最終時刻からの経過時間が所定時間以上か否かの判定を含む、
    請求項2記載のストレージシステム。
    The determination of (b1) includes a determination of whether or not the elapsed time from the last time of the I / O request regarding the pool during the copyback process is a predetermined time or more.
    The storage system according to claim 2.
  8.  (b1)の判定は、前記コピーバック処理中のプールに関するプロセッサ使用率が所定使用率未満か否かの判定を含む、
    請求項2記載のストレージシステム。
    The determination of (b1) includes a determination of whether or not a processor usage rate related to the pool during the copyback process is less than a predetermined usage rate.
    The storage system according to claim 2.
  9.  前記プロセッサ部は、
      (b3)(b1)の判定結果が偽の場合、前記データコピー処理を停止する、
    請求項2記載のストレージシステム。
    The processor unit is
    (B3) If the determination result of (b1) is false, the data copy process is stopped.
    The storage system according to claim 2.
  10.  (b1)の判定は、(b11)乃至(b14)のうちの少なくとも1つの判定を含む、
      (b11)前記コピーバック処理中のプールにおける複数のストライプにそれぞれ対応した複数の冗長度の最低値が所定値か否かの判定、
      (b12)前記コピーバック処理中のプールに関するI/O要求の最終時刻からの経過時間が所定時間以上か否かの判定
      (b13)(b13-1)乃至(b13-3)のうちの少なくとも1つの判定、
        (b13-1)前記コピーバック処理中のプールに関するI/O要求に対するライト要求の割合が所定割合未満か否かの判定、
        (b13-2)前記コピーバック処理中のプールに関するI/O要求に対するリード要求の割合が所定割合未満か否かの判定、
        (b13-3)前記コピーバック処理中のプールに関するI/O要求の頻度が所定頻度未満か否かの判定、
      (b14)前記コピーバック処理中のプールに関するプロセッサ使用率が所定使用率未満か否かの判定、
    請求項9記載のストレージシステム。
    The determination of (b1) includes at least one determination of (b11) to (b14).
    (B11) Determining whether or not the minimum value of the plurality of redundancy levels respectively corresponding to the plurality of stripes in the pool during the copy back process is a predetermined value;
    (B12) Judgment as to whether or not the elapsed time from the last time of the I / O request regarding the pool during the copyback process is a predetermined time or more (b13) At least one of (b13-1) to (b13-3) One decision,
    (B13-1) Determination of whether or not the ratio of write requests to I / O requests related to the pool during the copyback process is less than a predetermined ratio;
    (B13-2) Determining whether the ratio of the read request to the I / O request regarding the pool during the copyback process is less than a predetermined ratio;
    (B13-3) Determining whether the frequency of I / O requests regarding the pool during the copyback process is less than a predetermined frequency,
    (B14) Determining whether or not the processor usage rate relating to the pool during the copyback process is less than a predetermined usage rate;
    The storage system according to claim 9.
  11.  (b1)の判定結果は、
      (b11)の判定結果が真であれば、真であり、
      (b11)の判定結果が偽でも、(b12)、(b13)及び(b14)のいずれかの判定結果が真であれば、真である、
    請求項10記載のストレージシステム。
    The determination result of (b1) is
    If the determination result of (b11) is true, it is true.
    Even if the determination result of (b11) is false, it is true if the determination result of any of (b12), (b13) and (b14) is true.
    The storage system according to claim 10.
  12.  (b1)の判定は、
      (b11)の判定と、
      (b13)の判定と、
      (b12)又は(b14)の判定と
    を含む、
    請求項11記載のストレージシステム。
    The determination of (b1)
    The determination of (b11);
    The determination of (b13);
    Including the determination of (b12) or (b14),
    The storage system according to claim 11.
  13.  I/O要求を受けた場合にそのI/O要求に従うI/O処理を実行するようになっているストレージシステムにおける復旧制御方法であって、
     論理的な1以上のRAIDグループを提供する複数の記憶デバイスのうちの問題記憶デバイスの複数のストリップに対応したデータを2以上の記憶デバイスの複数のスペア領域にそれぞれ復元するリビルド処理を実行し、
        各RAIDグループは、2以上のストライプで構成されており、
        各ストライプは、2以上のストリップで構成されており、
        前記複数の記憶デバイスの各々は、複数のストリップと、1以上のスペア領域とを有し、
        前記各RAIDグループについて、いずれかのストライプを構成する2以上のストリップをそれぞれ提供する2以上の記憶デバイスの少なくとも1つと、別のいずれかのストライプを構成する2以上のストリップをそれぞれ提供する2以上の記憶デバイスの少なくとも1つが、異なる記憶デバイスであり、
     前記リビルド処理の完了後、前記問題記憶デバイスの交換後記憶デバイスが有する複数のストリップのうちデータ未復元のストリップに対応したスペア領域からデータをそのストリップにコピーするデータコピー処理を含んだコピーバック処理を実行し、
     以下の(W)及び(R)のうちの少なくとも1つを実行し、
        (W)前記交換後記憶デバイス内のストリップをライト先としたライト要求を前記コピーバック処理中に受けた場合、そのライト要求に従うライト処理において、そのライト要求に従うデータを、そのライト先のストリップに書き込むこと、
        (R)前記交換後記憶デバイス内のストリップをリード元としたリード要求を前記コピーバック処理中に受けた場合、そのリード要求に従うリード処理において、そのリード元に対応したスペア領域からデータを読み込み、その読み込んだデータを、そのリード元のストリップに書き込むこと、
      前記コピーバック処理において、前記交換後記憶デバイスが有する複数のストリップのうち、前記コピーバック処理中のライト処理及びリード処理のいずれかにおいてデータが既に書き込まれているストリップについては、前記データコピー処理をスキップする、
    復旧制御方法。
    A recovery control method in a storage system configured to execute I / O processing according to an I / O request when an I / O request is received,
    Executing a rebuild process for restoring data corresponding to a plurality of strips of a problem storage device among a plurality of storage devices that provide one or more logical RAID groups, respectively, to a plurality of spare areas of the two or more storage devices;
    Each RAID group is composed of two or more stripes,
    Each stripe consists of two or more strips,
    Each of the plurality of storage devices has a plurality of strips and one or more spare areas;
    For each of the RAID groups, at least one of two or more storage devices that respectively provide two or more strips that constitute one of the stripes, and two or more that respectively provide two or more strips that constitute one of the other stripes At least one of the storage devices is a different storage device;
    After completion of the rebuild process, a copy-back process including a data copy process for copying data from a spare area corresponding to a strip in which data is not restored among a plurality of strips of the storage device after replacement of the problem storage device to the strip Run
    Execute at least one of the following (W) and (R):
    (W) When a write request with the strip in the storage device after the exchange as a write destination is received during the copyback process, in the write process according to the write request, the data according to the write request is transferred to the write destination strip. Writing,
    (R) When a read request with the strip in the storage device after replacement as a read source is received during the copyback process, in the read process according to the read request, data is read from a spare area corresponding to the read source, Write the read data to the strip from which it was read,
    In the copy back process, among the plurality of strips of the storage device after replacement, the data copy process is performed for a strip in which data has already been written in either the write process or the read process during the copy back process. skip,
    Recovery control method.
PCT/JP2017/007015 2017-02-24 2017-02-24 Storage system and recovery control method WO2018154697A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2017/007015 WO2018154697A1 (en) 2017-02-24 2017-02-24 Storage system and recovery control method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2017/007015 WO2018154697A1 (en) 2017-02-24 2017-02-24 Storage system and recovery control method

Publications (1)

Publication Number Publication Date
WO2018154697A1 true WO2018154697A1 (en) 2018-08-30

Family

ID=63252491

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/007015 WO2018154697A1 (en) 2017-02-24 2017-02-24 Storage system and recovery control method

Country Status (1)

Country Link
WO (1) WO2018154697A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111124263A (en) * 2018-10-31 2020-05-08 伊姆西Ip控股有限责任公司 Method, electronic device, and computer program product for managing a plurality of discs
TWI709042B (en) * 2018-11-08 2020-11-01 慧榮科技股份有限公司 Method and apparatus for performing mapping information management regarding redundant array of independent disks, and associated storage system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08221217A (en) * 1995-02-17 1996-08-30 Hitachi Ltd Disk array subsystem data reconstruction method
JP2005099995A (en) * 2003-09-24 2005-04-14 Fujitsu Ltd Disk sharing method and system for magnetic disk device
JP2016038767A (en) * 2014-08-08 2016-03-22 富士通株式会社 Storage control device, storage control program, and storage control method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08221217A (en) * 1995-02-17 1996-08-30 Hitachi Ltd Disk array subsystem data reconstruction method
JP2005099995A (en) * 2003-09-24 2005-04-14 Fujitsu Ltd Disk sharing method and system for magnetic disk device
JP2016038767A (en) * 2014-08-08 2016-03-22 富士通株式会社 Storage control device, storage control program, and storage control method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111124263A (en) * 2018-10-31 2020-05-08 伊姆西Ip控股有限责任公司 Method, electronic device, and computer program product for managing a plurality of discs
CN111124263B (en) * 2018-10-31 2023-10-27 伊姆西Ip控股有限责任公司 Method, electronic device and computer program product for managing a plurality of discs
TWI709042B (en) * 2018-11-08 2020-11-01 慧榮科技股份有限公司 Method and apparatus for performing mapping information management regarding redundant array of independent disks, and associated storage system
US11221773B2 (en) 2018-11-08 2022-01-11 Silicon Motion, Inc. Method and apparatus for performing mapping information management regarding redundant array of independent disks

Similar Documents

Publication Publication Date Title
US11163472B2 (en) Method and system for managing storage system
US10459814B2 (en) Drive extent based end of life detection and proactive copying in a mapped RAID (redundant array of independent disks) data storage system
US10656849B2 (en) Storage system and control method thereof
US9152332B2 (en) Storage system and method for reducing energy consumption
US9378093B2 (en) Controlling data storage in an array of storage devices
JP6009095B2 (en) Storage system and storage control method
US9400618B2 (en) Real page migration in a storage system comprising a plurality of flash packages
US20070011579A1 (en) Storage system, management server, and method of managing application thereof
WO2011108027A1 (en) Computer system and control method therefor
US20140075240A1 (en) Storage apparatus, computer product, and storage control method
US8812779B2 (en) Storage system comprising RAID group
CN111104055B (en) Method, apparatus and computer program product for managing a storage system
US10579540B2 (en) Raid data migration through stripe swapping
CN113934367A (en) Storage device, method for operating storage device and storage system
US9760296B2 (en) Storage device and method for controlling storage device
US9400723B2 (en) Storage system and data management method
WO2018154697A1 (en) Storage system and recovery control method
WO2018142622A1 (en) Computer
US8880939B2 (en) Storage subsystem and method for recovering data in storage subsystem
US20170038993A1 (en) Obtaining additional data storage from another data storage system
CN110413197B (en) Method, apparatus and computer program product for managing a storage system
US20230214134A1 (en) Storage device and control method therefor
JP2005055963A (en) Volume control method, program performing it, and storage device
KR20210137922A (en) Systems, methods, and devices for data recovery using parity space as recovery space
JP2020086554A (en) Storage access control device, storage access control method, and storage access control program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17897750

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17897750

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载