WO2018154697A1

WO2018154697A1 - Storage system and recovery control method

Info

Publication number: WO2018154697A1
Application number: PCT/JP2017/007015
Authority: WO
Inventors: 聡上條
Original assignee: 株式会社日立製作所
Priority date: 2017-02-24
Filing date: 2017-02-24
Publication date: 2018-08-30

Abstract

In this storage system, each of a plurality of storage devices has a plurality of strips and one or more spare regions. In a RAID group formed by these storage devices, at least one of two or more storage devices relating to a first stripe differs from at least one of two or more storage devices relating to a second stripe. The storage system performs: a rebuild process in which data for the plurality of strips of a failed storage device are restored respectively to a plurality of spare regions of two or more storage devices; and a copy-back process including a data copy process in which, after the failed storage device has been replaced by another storage device, data in at least one spare region of said plurality of spare regions are copied to at least one strip of the plurality of strips of the other storage device, wherein said at least one spare region is associated with said at least one strip, and no data have yet been restored to said at least one strip. During the copy-back process, the storage system skips the data copy process for a strip if data have already been written to the strip in a write process or a read process.

Description

Storage system and recovery control method

The present invention generally relates to storage system recovery control.

Regarding a storage system having a RAID (Redundant Array of Independent) (or Inexpensive) Disks) group composed of a plurality of disks, a storage system provided with a spare disk is known. In such a storage system, the data in the failed disk in the RAID group is restored to the spare disk (rebuild process), and the data is copied from the spare disk to the disk after the failed disk is replaced (copy back process). (For example, Patent Document 1).

JP 2006-260236 A

Generally, a storage device having a large storage capacity is adopted as a storage device (for example, a disk) constituting a RAID group.

If the storage capacity of the spare storage device is large, it takes a long time for both rebuild processing to the spare storage device and copy back processing from the spare storage device. For this reason, the I / O processing performance (I / O processing performance in accordance with the I / O request) may be lowered.

The storage system includes a plurality of storage devices that provide one or more logical RAID groups, and a processor unit that is one or more processors connected to the plurality of storage devices. Each RAID group is composed of two or more stripes. Each stripe is composed of two or more strips. Each of the plurality of storage devices has a plurality of strips and one or more spare areas. For each RAID group, at least one of two or more storage devices each providing two or more strips constituting any stripe, and two or more providing each two or more strips constituting any other stripe At least one of the storage devices is a different storage device. When receiving an I / O request, the processor unit executes I / O processing according to the I / O request. The processor unit executes a rebuild process for restoring data corresponding to a plurality of strips of a problem storage device among a plurality of storage devices to a plurality of spare areas of two or more storage devices, respectively. In addition, the processor unit includes a data copy process for copying data from a spare area corresponding to a strip in which data is not restored among a plurality of strips included in the storage device after replacement of the problem storage device after the rebuild process is completed. Execute copyback processing. The processor unit includes at least one of the following (W) and (R):
(W) When a write request with the strip in the storage device as the write destination is received during the copyback process after the replacement, the data according to the write request is written to the write destination strip in the write process according to the write request. ,
(R) When a read request with the strip in the storage device as the read source is received during the copy back process after the replacement, data is read from the spare area corresponding to the read source and read in the read process according to the read request. Write the data to the strip that leads it,
Execute. The processor unit skips the data copy process for a strip in which data has already been written in either the write process or the read process during the copy back process among the plurality of strips of the storage device after replacement.

The storage device can be restored while reducing the decrease in I / O processing performance.

2 shows an example of the configuration of a computer system related to Example 1. 2 shows an example of a logical configuration and a physical configuration of a pool. 2 shows an example of a program and a table stored in a local memory. An example of a pool state table is shown. An example of a copyback setting management table is shown. An example of a progress management table is shown. An example of a head address management table is shown. The flow of recovery processing is shown. An example of the details of a rebuild process is shown. An example of the details of a copyback process is shown. The detailed flow of a copyback process is shown. The detailed flow of a data copy process is shown. An example of write processing during copyback processing is shown. An example of the transition of the progress management table according to the progress of the copyback process and the write process is shown. 22 shows an example of read processing during copy back processing according to the second embodiment.

Several embodiments will be described with reference to the drawings. The embodiments described below do not limit the invention according to the claims, and all the elements and combinations described in the embodiments are not necessarily essential to the solution of the invention. Absent.
In the following description, information may be described using an expression such as “xxx table”, but the information may be expressed in any data structure. That is, in order to show that the information does not depend on the data structure, the “xxx table” can be referred to as “xxx information”. In the following description, the configuration of each table is an example, and one table may be divided into two or more tables, or all or part of the two or more tables may be a single table. Good.

In the following description, the “interface unit” includes one or more interfaces. The one or more interfaces may be one or more similar interface devices (for example, one or more NIC (Network Interface Card)) or two or more different interface devices (for example, NIC and HBA (Host Bus Adapter)). There may be.

In the following description, the “storage unit” includes one or more memories. The at least one memory for the storage unit may be a volatile memory. The storage unit is mainly used during processing by the processor unit.

In the following description, the “processor unit” includes one or more processors. The at least one processor is typically a microprocessor such as a CPU (Central Processing Unit). Each of the one or more processors may be a single core or a multi-core. The processor may include a hardware circuit that performs part or all of the processing.

In the following description, the process may be described using “program” as a subject. However, the program is executed by a processor unit (for example, CPU (Central Processing Unit)), so that the determined processing is appropriately performed. Therefore, the subject of processing may be the processor unit (or an apparatus or system having the processor unit) because the processing is performed using a storage unit (for example, a memory) and / or an interface device (for example, a communication port). The processor unit may include a hardware circuit that performs part or all of the processing. The program may be installed in a computer-like device from a program source. The program source may be, for example, a recording medium (for example, non-transitory) readable by a program distribution server or a computer. In the following description, two or more programs may be realized as one program, or one program may be realized as two or more programs.

In the following description, the “host system” may be one or more physical host computers (for example, a cluster of host computers), or at least one virtual host computer (for example, VM (Virtual Machine)). ) May be included. Hereinafter, the host system is simply referred to as “host”.

In the following description, the “storage system” may be one or more storage devices. The “storage device” may be any device having a function of storing data in the storage device. For this reason, the storage device may be a computer (for example, a general-purpose computer) such as a file server. For example, at least one physical storage device may execute a virtual computer (for example, VM (Virtual Machine)), or may execute SDx (Software-Defined anything). As SDx, for example, SDS (Software Defined Storage) (an example of a virtual storage device) or SDDC (Software-defined Datacenter) can be adopted. For example, at least one storage device (computer) may have a hypervisor. The hypervisor may generate a server VM (Virtual Machine) that operates as a server and a storage VM that operates as a storage. The server VM may operate as a host that issues an I / O request, and the storage VM may operate as a storage controller that performs I / O to a drive in response to an I / O request from the server VM.

Moreover, in the following description, when explaining without distinguishing the same kind of element, a reference code (or a common part in the reference sign) is used, and when explaining the same kind of element separately, the element ID (or Element reference signs) may be used.

In the following description, numbers are used as element identifiers, but other types of codes may be used instead of or in addition to numbers.

FIG. 1 shows an example of the configuration of a computer system according to the embodiment.

The computer system has a host 101 and a storage system 102. The host 101 and the storage system 102 are connected to each other via a communication network 152.

The host 101 transmits an I / O (Input / Output) request to the storage system 102. The I / O request includes I / O destination information indicating the location of the I / O destination. The I / O destination information includes, for example, the LUN (Logical Unit Number) of the LU (Logical Unit) of the I / O destination and the LBA (Logical Unit Block Address) of the area in the LU. An LU is a logical volume (logical storage device) provided from the storage system 110. Based on the I / O destination information, the logical area of the I / O destination is specified, and the drive 124 based on the logical area is specified.

The storage system 102 includes a storage controller 103 and a drive box 121. The drive box 121 includes a plurality (or one) of pools 183. Each pool 183 includes a plurality of drives 124. The drive 124 is an example of a storage device (typically a non-volatile storage device), and is, for example, an HDD (Hard Disk Drive) or an SSD (Solid State Drive).

The storage controller 103 includes a host I / F 111, a cache memory (CM) 112, a CPU (Central Processing Unit) 113, a drive I / F 114, and a local memory (LM) 115. The host I / F 111 and the drive I / F 114 are examples of the interface unit. The cache memory 112 and the local memory 115 are examples of a storage unit. The CPU 113 is an example of a processor unit.

The host I / F 111 is an example of an interface device of the storage controller 103, and communicates with the host 101. The cache memory 112 temporarily stores data (write data) written from the host 101 to the drive 124 and data (read data) read from the drive 124. The CPU 113 executes various processes by executing a program stored in the local memory 115. The CPU 113 is connected to the host I / F 111, the cache memory 112, the drive I / F 114, and the local memory 115. The CPU 113 transmits various commands to the drive 124 of the storage device 121 via the drive I / F 114. The drive I / F 114 is an example of an interface device of the storage controller 103, and communicates with each drive 124. The local memory 115 stores various programs and various information.

An example of the operation of the storage controller 103 is as follows. That is, the storage controller 103 processes the I / O request received from the host 101. Specifically, for example, the storage controller 103 identifies the drive 124 that is the data I / O destination based on the I / O destination of the I / O request, and executes I / O for the identified drive 124. At this time, the storage controller 103 caches the I / O target data in the cache memory 112.

FIG. 2 shows an example of the logical configuration and physical configuration of the pool 183.

According to the logical configuration, the pool 183 has a plurality of RAID groups (logical RAID groups) 223. A logical volume is provided based on the RAID group 223. Each RAID group 223 has a plurality of drives 224 (logical drives). Each RAID group 223 is a RAID group according to a 2D + 2P RAID configuration and a RAID 6 RAID level. That is, in each RAID group 223, each stripe is composed of four strips (unit storage areas) that the four drives 224 respectively have. In each stripe, two data elements (D) are stored in two strips, respectively, and two parities (P) based on the two data elements are stored in two strips. In the pool 183, the plurality of RAID types (RAID level and RAID configuration) respectively corresponding to the plurality of RAID groups 223 may be the same.

On the other hand, according to the physical configuration, each pool 183 has a plurality of drive groups 123. In this embodiment, the number of drive groups 123 is the same as the number of RAID groups 223, but may be different. In the present embodiment, the number of drives 124 constituting each drive group 123 is the same as the number of drives 224 constituting the RAID group 223, but may be different. In the present embodiment, each drive group 123 has four drives 124 (physical drives). Each drive group 123 does not constitute a RAID group by itself. The pool 183 logically configures the plurality of RAID groups 223 described above.

In this embodiment, a distributed RAID configuration is adopted. Specifically, each RAID group 223 is composed of two or more stripes. Each stripe is composed of two or more strips. Each drive 124 has a plurality of strips. For each RAID group 223, at least one of the two or more drives 124 that respectively provide two or more strips that constitute the first stripe, and two or more drives that respectively provide two or more strips that constitute the second stripe At least one of 124 is a different drive 124. For each RAID group 123, the first stripe is any stripe, and the second stripe is any stripe other than the first stripe. Specifically, in this embodiment, a plurality of strips constituting the same stripe are distributed in different drive groups, and a plurality of drive positions (physical addresses in the drives) respectively corresponding to the plurality of strips. The position to follow is different. In FIG. 2, the same number is given to each of the four strips constituting the same stripe. According to the physical configuration, for example, four strips 0-0 constituting the same stripe are distributed in a plurality of drive groups 123.

Further, according to the physical configuration, each drive 124 has one or more spare areas (S) in addition to a plurality of strips. In this embodiment, each drive 124 has one spare area. The “spare area” is a spare storage area. According to the comparison between the logical configuration and the physical configuration, for each drive 124, any strip is a component of the RAID group 223, but the spare area is not a component of the RAID group 223. When a failed drive occurs, data (data element or parity) in the strip in the failed drive is restored to the spare area, as will be described later. The size of each spare area is not less than the strip size.

As described above, in this embodiment, a distributed RAID configuration is adopted, and a spare area is provided in each drive 124 instead of providing a spare drive. Note that, depending on the pool 183, at least one of the RAID level and the RAID configuration may be different. In this embodiment, to simplify the description, it is assumed that the RAID level of any pool 183 is RAID 6 and the RAID configuration is 2D + 2P.

FIG. 3 shows an example of programs and tables stored in the local memory 115.

The local memory 115 stores various programs. Examples of the program include a host I / O processing program 301, a rebuild processing program 302, a copyback processing program 303, and a parity generation program 304. The host I / O processing program 301 processes an I / O request from the host 101. The rebuild process program 302 executes a rebuild process. The copy back processing program 303 executes copy back processing. The parity generation program 304 generates the parity stored in the stripe.

The local memory 115 stores various tables. Examples of the table include a pool status table 305, a copyback setting management table 306, a progress management table 307, and a head address management table 308.

FIG. 4 shows an example of the pool status table 305 and the status of the pool corresponding to the table 305.

The pool state table 305 is information indicating the redundancy and state of the pool 183. The pool state table 305 manages entries that hold information such as the pool number 401, the redundancy 402, and the state 403 for each pool 183.

Pool number 401 is a pool number. The redundancy 402 indicates the redundancy of the pool 183. The redundancy as the redundancy 402 is the lowest redundancy in the pool 183. In other words, because the pool 183 has a distributed RAID configuration, even if there are N failed drives (N is an integer of 1 or more), there are a plurality of redundancy levels respectively corresponding to a plurality of stripes related to the N failed drives. It is not always the same. Some stripes have a redundancy “1” and some stripes have a redundancy “0”. In this case, the minimum value “0” is registered as the redundancy 402.

The state 403 indicates a state related to processing to the pool 183. An example of processing for the pool 183 is copy back processing. As the value of the status 403, for example, “copyback is in progress”, which means that copyback processing is in progress and data copy processing described later is being processed, and data copy processing is stopped There is “stopped” meaning that it is in an inactive state, and “none” meaning that copyback processing is not in progress.

According to the table 305 in FIG. 4, the redundancy and status of each of the pools 0 to 2 are as follows. That is, in pool 0, two drives have failed and the redundancy is from 2 to 0. For the pool 0, the data copy process in the copy back process is being performed. In the pool 1, a failure occurs in one drive, and the redundancy is 2 to 1. Although the copy back process is being performed for the pool 1, the data copy process in the process is in a stopped state. In the pool 2, no failure has occurred in any drive.

FIG. 5 shows an example of the copyback setting management table 306.

The copyback setting management table 306 is a table for managing thresholds for determining whether or not to execute data copy processing. The copyback setting management table 306 manages entries that hold information such as a pool number 501, an I / O waiting time 502, a write rate 503, a CPU usage rate 504, and a determination waiting time 505 for each pool 183. .

Pool number 501 is the number of pool 183. The I / O waiting time 502 indicates a time during which an I / O request (hereinafter referred to as host I / O) has not arrived from the host 101 for the pool 183. The write ratio 503 indicates the ratio of writes in the host I / O for the pool 183. The CPU usage rate 504 indicates the usage rate of the CPU 113 with respect to processing for the pool 183. The determination waiting time 505 indicates a waiting time from the start or stop of the data copy process to the determination.

The information 502 to 505 is prepared for each pool 183, but may be common to all the pools 183.

FIG. 6 shows an example of the progress management table 307.

The progress management table 307 is a table for managing the progress of the copy back process for the replaced drive (the drive replaced with the failed drive 124). The progress management table 307 manages entries that hold information such as an address 601, a completion flag 602, and a data position 603 for each strip included in the drive after replacement. An address 601 is a strip address (number). The completion flag 602 is a flag indicating whether or not the data restoration for the strip is completed. As the value of the completion flag 602, “1” is set when it is completed, and “0” is set when it is not completed. A data position 603 indicates a position (a combination of the number of the drive 124 and the address of the spare area in the drive 124) where the data (data to be copied back) to be restored exists in the strip.

FIG. 7 shows an example of the head address management table 308.

The head address management table 308 is a table for managing a position where the copy back process is not completed. The head address management table 308 stores the head address of the addresses 601 corresponding to the completion flag 602 “0” in the progress management table 307.

Hereinafter, the processing performed in this embodiment will be described by taking one pool 183 as an example. In the following description, the pool 183 is referred to as a “target pool 183”. In the description of this embodiment, the meaning of each term is as follows.
“Problem drive” is a drive in which a problem has occurred. “Problem” means a failure or a high possibility of occurrence of a failure. In this embodiment, the problem drive is a failed drive. A “failed drive” is a drive in which a failure has occurred. A “failure strip” is a strip in a failed drive. As another example of the problem drive, a failure candidate drive (a drive with a high possibility of failure) can be adopted. A strip in a failure candidate drive may be referred to as a “failure candidate strip”.
“Restoration” is a term that may be used to mean writing to a spare area in a drive or writing to a strip in a drive after replacement.
“Drive after replacement” is a drive that has been replaced with a problem drive.
“Recovery processing” is a term that means processing including rebuild processing and copy back processing.
“Recovery” is a term that includes rebuild and copyback.
“Rebuild process” is a process including a rebuild. “Rebuild” is to restore data corresponding to all fault strips to a plurality of spare areas, that is, collection copy. The “data corresponding to the failure strip” is typically data in the failure strip, but the data after updating the data in the failure strip may also correspond. When the problem drive is a failure candidate drive, rebuilding is to restore data in all failure candidate strips to a plurality of spare areas, that is, dynamic sparing.
“Copy back processing” is processing including copy back. “Copy back” is a process corresponding to a data copy process to be described later, and is to copy data from a spare area to a strip in a drive after replacement. In this embodiment, data may be restored to the strip in the drive after replacement by copy back, or data may be restored according to host I / O instead of copy back.
“Data XX” is data (data element or parity) in the strip XX (XX is a number).

FIG. 8 shows the flow of recovery processing. The recovery process is started when, for example, the CPU 113 (for example, the rebuild process program 302) detects that a failure (failure) has occurred in the drive 124.

The rebuild process program 302 starts the rebuild process for restoring the same data as the data (data element or parity) in the strip in the failed drive 124 to the spare area of the normal disk 124 (S801).

FIG. 9 shows an example of the details of the rebuild process.

Suppose that a failure has occurred in the drive 00 in the target pool 183. The faulty drive 00 includes fault strips 0-1, 2-2, 1-1 and 2-3.

Based on the data 0-1 in the normal drives 05, 06 and 0B, the rebuild process program 302 changes the data 0-1 in the failed drive 00 to the spare area in any normal drive, for example, the normal drive 01. Restore to spare area. The rebuild processing program 302 adds the position of the spare area (a combination of the drive 01 number and the address of the spare area) to the progress management table 307 corresponding to the target pool 183 as the data position 603.

Similarly, the rebuild processing program 302 restores the other data 2-2, 1-1 and 2-3 in the failed drive 00 to, for example, spare areas in the normal drives 02 to 04, and The positions of the spare areas are added to the progress management table 307 as data positions 603, respectively.

Thus, in this embodiment, a plurality of data in the failed drive are restored to the spare areas of a plurality of normal drives, respectively. In other words, data write destinations are not distributed to a single drive like a spare drive, but are distributed to a plurality of drives. For this reason, it is expected that the time required for the rebuild process can be shortened.

Return to FIG. During or after the rebuild process, the failed drive 124 is replaced, for example, by maintenance personnel (S802).

When replacement of the failed drive is detected by, for example, the CPU 113 (for example, the rebuild process program 302), the rebuild process program 302 determines whether the rebuild process has been completed (S803). If the rebuild is not completed (S803: NO), the rebuild process program 302 continues the rebuild process. On the other hand, if the rebuild process has been completed (S803: YES), the rebuild process program 302 outputs a drive replacement completion sign (for example, a predetermined LED (Light Emitting Diode) provided in the storage system 102). (S804). A person who sees the sign (for example, a maintenance staff) issues a copyback processing instruction to the storage system via an input / output interface such as a maintenance terminal.

Next, the copyback processing program 303 executes the copyback processing in response to, for example, an instruction for copyback processing (or in response to detection of the completion of drive replacement) (S805).

FIG. 10 shows an example of details of the copyback process. FIG. 10 corresponds to the continuation of the rebuild process shown in FIG.

When the failed drive 00 is replaced and the rebuild process is completed, the copy back process is started. In the copy back processing, the copy back processing program 303 converts the data 0-1, 2-2, 1-1, and 2-3 in the spare area of the drives 01 to 04 to the strip 0-1, the replacement drive 00, respectively. Restore (copy) to 2-2, 1-1, and 2-3, respectively.

FIG. 11 shows the detailed flow of the copyback process.

The copyback processing program 303 refers to the redundancy 402 corresponding to the target pool 183 and determines whether the redundancy 402 is “0” (S1101). If the determination result in S1101 is true (S1101: YES), the copyback processing program 303 executes data copy processing (S1105). This is to increase the reliability of data protection of the target pool 183. The redundancy “0” is an example of a redundancy threshold. The threshold may be greater than zero. When the state 403 corresponding to the target pool 183 is not “copying back” when executing S1105, the copyback processing program 303 updates the state 403 to “copying back”.

If the determination result of S1101 is false (S1101: NO), the elapsed time from the last time when the copyback processing program 303 performed host I / O on the target pool 183 is the I corresponding to the target pool 183. It is determined whether or not the / O waiting time 502 is over (S1102). If the determination result in S1102 is true (S1102: YES), the copyback processing program 303 executes data copy processing (S1105). This is because since the host I / O is not performed, the load on the CPU 113 is relatively low, and it is considered that it is efficient to use it for the data copy processing.

If the determination result in S1102 is false (S1102: NO), the copyback processing program 303 determines whether the write ratio in the host I / O to the target pool 183 is less than the write ratio 503 corresponding to the target pool 183. Is determined (S1103). If the determination result in S1103 is true (S1103: YES), the copyback processing program 303 executes data copy processing (S1105). This is because if the write ratio is low, there is a low possibility that data is restored to the strip in the drive after the replacement without data copy between the drives in the later-described write processing during the copy back processing.

When the determination result in S1103 is false (S1103: NO), the copyback processing program 303 determines whether or not the CPU usage rate related to the target pool 183 is less than the CPU usage rate 504 corresponding to the target pool 183 ( S1104). If the determination result in S1104 is true (S1104: YES), the copyback processing program 303 executes data copy processing (S1105). This is because the load on the CPU 113 is relatively low, and it is considered that it is more efficient to use it for data copy processing.

If the determination result in S1104 is false (S1104: NO), the copyback processing program 303 stops the data copy process (S1106). At this time, if the state 403 corresponding to the target pool 183 is not “stopped”, the copyback processing program 303 updates the state 403 to “stopped”. At this time, if the data copy process is already stopped (if the state 403 is already “stopped”), S1106 may be skipped (that is, the data copy process remains stopped).

Next, the copy back processing program 303 determines whether the elapsed time from the data copy processing stop time is equal to or longer than the determination waiting time 505 corresponding to the target pool 183 (S1107).

If the determination result in S1107 is true (that is, if the determination wait time 505 or more has been waited after the data copy process is stopped) (S1107: YES), the copyback processing program 303 determines whether the copyback process has been completed. (S1108). The determination in S1108 is a determination as to whether or not the completion flag 602 corresponding to all the strips in the post-replacement drive is “1” in the progress management table 307 corresponding to the target pool 183. If the determination result in S1108 is false (S1108: NO), the process returns to Step 1101. On the other hand, if the determination result in S1108 is true (S1108: YES), the copyback process is terminated.

S1101, S1102, S1103, and S1104 are in descending order of priority. The determination may be made in a different order. Further, the data copy process (S1105) may be performed when at least one determination result of S1101 to S1104 is true, or the data copy process when at least one determination result of S1101 to S1104 is false. (S1105) may be stopped. For example, if the redundancy 402 of the target pool 183 is larger than a threshold (eg, “0”) (S1101: NO), the data copy process may be stopped regardless of the result of other determinations. For example, if the elapsed time from the last host I / O is less than the I / O waiting time for the target pool 183 (S1102: NO), the data copy process is stopped regardless of the result of other determinations. Also good. For example, if the write ratio is equal to or higher than the write ratio 503 (S1103: NO), the data copy process may be stopped regardless of the result of other determinations.

FIG. 12 shows the detailed flow of the data copy process.

The copyback processing program 303 identifies the head address for which data copying has not been completed from the head address management table 308 (S1201).

Next, the copyback processing program 303 identifies the data position 603 corresponding to the address 601 that matches the address identified in S1201 (S1202).

Next, the copyback processing program 303 copies the copyback processing target data from the spare area according to the data position 603 specified in S1202 to the strip (strip in the drive after replacement) specified in S1201 (S1203). .

Next, the copyback processing program 303 updates the completion flag 602 corresponding to the copy destination strip to “1” (completed), and sets the head address in the head address management table 308 to the completion flag 602 “0”. Update to the head address of the corresponding address 601 (S1204). In S1204, when the minimum value of the plurality of redundancy levels corresponding to the plurality of stripes in the target pool 183 becomes high, the copyback processing program 303 also updates the redundancy level 402 corresponding to the target pool 183.

Next, the copy back processing program 303 determines whether or not the copy back processing is completed (S1205). This determination is the same as the determination in S1108 of FIG. If the determination result in S1205 is true (S1205: YES), the copyback process ends.

On the other hand, if the determination result in S1205 is false (S1205: NO), the copy back processing program 303 determines whether or not the elapsed time from the start time of the data copy process is equal to or greater than the determination waiting time 505 corresponding to the target pool 183. Determination is made (S1206). If the determination result in S1206 is false (S1206: NO), the process returns to S1201 (that is, the data copy process continues). On the other hand, if the determination result in S1206 is true (S1206: YES), the process returns to S1101 (at this time, the data copy process may be temporarily ended (stopped)).

FIG. 13 shows an example of write processing during copy back processing.

When the host I / O processing program 301 receives a write request from the host 101 during the copy back processing, the host I / O processing program 301 writes the data according to the write request to the strip in the drive after replacement. As a result, the data is restored to the strip in the drive after replacement. Details are as follows, for example.

When the host I / O processing program 301 receives a write request with the strip 2-2 in the drive 00 after replacement as the write destination during the copy back processing (S14-1), the host I / O processing program 301 Data 2-2 is read from each strip 2-2 of the normal drives 01, 06 and 0B (S14-2). Next, the host I / O processing program 301 calculates a parity from the plurality of read data 2-2 and the updated data 2-2 from the host 101 (S14-3). Next, the host I / O processing program 301 writes the post-update data 2-2 to the strip 2-2 in the post-replacement drive 00 (S14-4). The host I / O processing program 301 returns a write completion response to the host 101 (S14-5). Further, if the completion flag 602 corresponding to the write destination strip 2-2 in the drive 00 after replacement is “0”, the host I / O processing program 301 updates it to “1” (S14-6). As described above, when a write request is received during the copy back process, the data can be restored to the strip in the drive after replacement by taking advantage of the process of the write request.

FIG. 14 shows an example of the transition of the progress management table 307 according to the progress of the copyback process and the write process. In FIG. 14, the progress management table 307 is schematically expressed (as a bitmap). It is assumed that the addresses are arranged as indicated by broken arrows.

According to the table 307 on the left side of FIG. 14, data is restored from the first to eighth strips by the copy back process (the completion flag 602 is “1”).

Then, it is assumed that data is restored to the 16th and 20th strips as a result of the write process during the copyback process. In this case, as shown in the center table 307 of FIG. 14, the incomplete copy start address indicates the ninth strip, but the completion flag 602 corresponding to the 16th and 20th strips is “1”. It becomes.

Thereafter, when the copy back process proceeds, as shown in the table 307 on the right side of FIG. 14, the address of the 16th strip that has already been restored (the completion flag 602 is “1”) It is skipped because it is an unfinished start address. That is, in the copy back process, the data copy process is skipped for the strip on which the data is restored by taking advantage of the write process. This reduces the load on the copyback process.

This example can be summarized as follows. In the following summary, matters not included in the above description may be included, and conversely, items existing in the above description may not be included.

A distributed RAID configuration is adopted. Instead of providing a spare drive, each drive 124 is provided with a spare area. A plurality of data in the failed drive are restored to spare areas of a plurality of normal drives, respectively. In other words, data write destinations are not distributed to a single drive like a spare drive, but are distributed to a plurality of drives. For this reason, the time required for the rebuild process can be shortened. As a result, the time during which the redundancy is reduced can be shortened, and the decrease in I / O processing performance can be reduced.

According to one comparative example, a spare drive is adopted, and the restoration destination in the rebuild process is aggregated into the spare drive, that is, all data in the failed drive is restored to the spare drive. In this case, if the spare drive becomes a member of the group in place of the failed drive, it is possible to expect copy backlessness that can make it appear that the copy back processing is completed without copying between the drives.

However, in this embodiment, since the above-described distributed RAID configuration is adopted, copy backless cannot be realized.

Therefore, in this embodiment, the host I / O processing program 301 sends updated data according to the write request to the strip in the drive after replacement in the write processing (processing according to the write request from the host 101) during copy back processing. Restore. As a result, piggybacking on the light process restores the data to the strip in the drive after replacement. In the copy back process, the data copy process is skipped for the restored strip. This reduces the load on the copyback process. As a result, a decrease in I / O processing performance can be reduced.

In this embodiment, thresholds are provided for the redundancy, the write ratio, and the CPU usage rate, and the execution and stop of the data copy processing are performed according to the comparison between the status during the copy back processing and those threshold values. Be controlled. For example, when the load on the CPU 113 is low (specifically, the host I / O has not been received for a certain period of time or the CPU usage rate is low), the data copy process is executed. If the write ratio is high, the data copy process is stopped. By finely controlling the execution and stop of the data copy process in the copy back process, it is possible to restore the drive while suppressing a decrease in host I / O processing performance.

Example 2 will be described. At that time, differences from the first embodiment will be mainly described, and description of common points with the first embodiment will be omitted or simplified.

FIG. 15 illustrates an example of read processing during copy back processing according to the second embodiment.

In the second embodiment, when the host I / O processing program 301 receives a read request from the host 101 during the copyback process, the host I / O processing program 301 stores the data according to the read request. The spare area is read and returned to the host 101, and the data is written to the strip in the drive after replacement. The data written on the strip is data to be copied back, and as a result, the data is restored to the strip in the drive after the replacement. Details are as follows, for example.

During the copy back process, for example, the host I / O processing program 301 receives a read request with the strip 2-2 in the drive 00 after replacement as the read source (S13-1). When the completion flag 602 corresponding to the address 601 of the strip 2-2 is “0”, the host I / O processing program 301 indicates a spare area according to the data position 603 corresponding to the address 601 (normal in the example of FIG. 15). Data 2-2 is read from the spare area of the drive 02 (S13-2). Next, the host I / O processing program 301 returns the read data 2-2 to the host 101 (S13-3), and writes the data 2-2 to the strip 2-2 in the drive after replacement. (S13-4). The host I / O processing program 301 updates the completion flag 602 corresponding to the strip 2-2 to “1” (S13-5).

According to this embodiment, it is possible to restore the copy-back process target data to the strip in the drive after the replacement by taking advantage of the read process during the copy-back process. Note that the data copy process is also skipped for the strip.

In this embodiment, a host I / O frequency threshold value may be employed instead of or in addition to the write ratio 503 threshold value. The copyback processing program 303 may execute a determination (hereinafter referred to as determination A) as to whether or not the host I / O frequency related to the target pool 183 is less than the threshold instead of or in addition to the determination in S1103. If the result of determination A is true, the copyback processing program 303 may execute data copy processing. This is because there is a low possibility that data is restored to the strip in the drive after the exchange by taking advantage of the host I / O process (write process or read process) during the copy back process.

In this embodiment, a read ratio threshold value may be used instead of or in addition to at least one of the write ratio 503 and the host I / O frequency threshold value. The copyback processing program 303 replaces or in addition to at least one of the determination in S1103 and the above determination A with the read ratio of the target pool 183 (the ratio of the read request to the host I / O related to the target pool 183) ) May be less than the lead ratio (threshold value) (hereinafter referred to as determination B). If the result of determination B is true, the copyback processing program 303 may execute data copy processing. This is because there is a low possibility that data is restored to the strip in the drive after the exchange by taking advantage of the host I / O process (write process or read process) during the copy back process.

At least one of the determinations of S1101 to S1104, determination A, and determination B described above (for example, at least one of determination of S1103, determination A and determination B, determination of S1101, and determination of S1102 or S1104) ) Is included in the determination as to whether or not the execution condition of the data copy process is satisfied. The copyback processing program 303 may determine whether or not the execution condition of the data copy process is satisfied regularly or irregularly. If the determination result is true, the copyback processing program 303 can execute the data copy processing (may include continuation). If the determination result is false, the copyback processing program 303 can stop the data copy processing.
Although several embodiments have been described above, these are examples for explaining the present invention, and are not intended to limit the scope of the present invention only to this embodiment. The present invention can be implemented in various other forms.

101 ... Host, 102 ... Storage system

Claims

A plurality of storage devices providing one or more logical RAID groups;
A processor unit that is one or more processors connected to the plurality of storage devices;
Each RAID group is composed of two or more stripes,
Each stripe consists of two or more strips,
Each of the plurality of storage devices has a plurality of strips and one or more spare areas;
For each of the RAID groups, at least one of two or more storage devices that respectively provide two or more strips that constitute one of the stripes, and two or more that respectively provide two or more strips that constitute one of the other stripes At least one of the storage devices is a different storage device;
When the processor unit receives an I / O request, the processor unit executes an I / O process according to the I / O request.
(A) executing a rebuild process for restoring data corresponding to a plurality of strips of a problem storage device among the plurality of storage devices to a plurality of spare areas of two or more storage devices;
(B) after completion of the rebuilding process, including a data copy process for copying data to a strip from a spare area corresponding to a data-unrestored strip among a plurality of strips of the storage device after replacement of the problem storage device Execute copyback processing,
The processor unit executes at least one of the following (W) and (R):
(W) When a write request with the strip in the storage device after the exchange as a write destination is received during the copyback process, in the write process according to the write request, the data according to the write request is transferred to the write destination strip. Writing,
(R) When a read request with the strip in the storage device after replacement as a read source is received during the copyback process, in the read process according to the read request, data is read from a spare area corresponding to the read source, Write the read data to the strip from which it was read,
The processor unit skips the data copy process for a strip in which data has already been written in either the write process or the read process during the copy back process among the plurality of strips of the storage device after replacement. To
Storage system.
The processor unit periodically or irregularly in (B),
(B1) It is determined whether or not an execution condition of the data copy process is satisfied,
(B2) If the determination result of (b1) is true, the data copy process is executed.
The storage system according to claim 1.
The processor unit is adapted to execute (W),
The determination in (b1) includes a determination as to whether or not the ratio of write requests to I / O requests related to the pool during the copyback process is less than a predetermined ratio.
The storage system according to claim 2.
The processor unit is adapted to execute (R),
The determination of (b1) includes a determination of whether or not the ratio of the read request to the I / O request regarding the pool during the copyback process is less than a predetermined ratio.
The storage system according to claim 2.
The processor unit is configured to execute both (W) and (R).
The determination of (b1) includes a determination of whether or not the frequency of I / O requests regarding the pool during the copyback process is less than a predetermined frequency.
The storage system according to claim 2.
The determination of (b1) includes a determination as to whether or not the minimum values of the plurality of redundancy levels respectively corresponding to the plurality of stripes in the pool during the copyback process are predetermined values.
The storage system according to claim 2.
The determination of (b1) includes a determination of whether or not the elapsed time from the last time of the I / O request regarding the pool during the copyback process is a predetermined time or more.
The storage system according to claim 2.
The determination of (b1) includes a determination of whether or not a processor usage rate related to the pool during the copyback process is less than a predetermined usage rate.
The storage system according to claim 2.
The processor unit is
(B3) If the determination result of (b1) is false, the data copy process is stopped.
The storage system according to claim 2.
The determination of (b1) includes at least one determination of (b11) to (b14).
(B11) Determining whether or not the minimum value of the plurality of redundancy levels respectively corresponding to the plurality of stripes in the pool during the copy back process is a predetermined value;
(B12) Judgment as to whether or not the elapsed time from the last time of the I / O request regarding the pool during the copyback process is a predetermined time or more (b13) At least one of (b13-1) to (b13-3) One decision,
(B13-1) Determination of whether or not the ratio of write requests to I / O requests related to the pool during the copyback process is less than a predetermined ratio;
(B13-2) Determining whether the ratio of the read request to the I / O request regarding the pool during the copyback process is less than a predetermined ratio;
(B13-3) Determining whether the frequency of I / O requests regarding the pool during the copyback process is less than a predetermined frequency,
(B14) Determining whether or not the processor usage rate relating to the pool during the copyback process is less than a predetermined usage rate;
The storage system according to claim 9.
The determination result of (b1) is
If the determination result of (b11) is true, it is true.
Even if the determination result of (b11) is false, it is true if the determination result of any of (b12), (b13) and (b14) is true.
The storage system according to claim 10.
The determination of (b1)
The determination of (b11);
The determination of (b13);
Including the determination of (b12) or (b14),
The storage system according to claim 11.
A recovery control method in a storage system configured to execute I / O processing according to an I / O request when an I / O request is received,
Executing a rebuild process for restoring data corresponding to a plurality of strips of a problem storage device among a plurality of storage devices that provide one or more logical RAID groups, respectively, to a plurality of spare areas of the two or more storage devices;
Each RAID group is composed of two or more stripes,
Each stripe consists of two or more strips,
Each of the plurality of storage devices has a plurality of strips and one or more spare areas;
For each of the RAID groups, at least one of two or more storage devices that respectively provide two or more strips that constitute one of the stripes, and two or more that respectively provide two or more strips that constitute one of the other stripes At least one of the storage devices is a different storage device;
After completion of the rebuild process, a copy-back process including a data copy process for copying data from a spare area corresponding to a strip in which data is not restored among a plurality of strips of the storage device after replacement of the problem storage device to the strip Run
Execute at least one of the following (W) and (R):
(W) When a write request with the strip in the storage device after the exchange as a write destination is received during the copyback process, in the write process according to the write request, the data according to the write request is transferred to the write destination strip. Writing,
(R) When a read request with the strip in the storage device after replacement as a read source is received during the copyback process, in the read process according to the read request, data is read from a spare area corresponding to the read source, Write the read data to the strip from which it was read,
In the copy back process, among the plurality of strips of the storage device after replacement, the data copy process is performed for a strip in which data has already been written in either the write process or the read process during the copy back process. skip,
Recovery control method.