WO2009033971A1 - System and method for splitting data and data control information - Google Patents
System and method for splitting data and data control information Download PDFInfo
- Publication number
- WO2009033971A1 WO2009033971A1 PCT/EP2008/061464 EP2008061464W WO2009033971A1 WO 2009033971 A1 WO2009033971 A1 WO 2009033971A1 EP 2008061464 W EP2008061464 W EP 2008061464W WO 2009033971 A1 WO2009033971 A1 WO 2009033971A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- array
- control information
- bus driver
- array controller
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000013500 data storage Methods 0.000 claims abstract description 20
- 239000000872 buffer Substances 0.000 claims abstract description 5
- 230000003139 buffering effect Effects 0.000 claims abstract description 3
- 238000012546 transfer Methods 0.000 claims description 32
- 238000010586 diagram Methods 0.000 description 11
- 238000012545 processing Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 5
- 238000013459 approach Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000001934 delay Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0659—Command handling arrangements, e.g. command buffers, queues, command scheduling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
- G06F3/0611—Improving I/O performance in relation to response time
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
- G06F3/0613—Improving I/O performance in relation to throughput
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0656—Data buffering arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0683—Plurality of storage devices
- G06F3/0689—Disk arrays, e.g. RAID, JBOD
Definitions
- the present invention relates to the field of mass storage solutions with multiple storage units.
- exemplary embodiments of the present invention relate to a Method and apparatus for splitting data and data control information on storage devices.
- High-speed data recording in the workflow for digital cinematography may involve a process data rate of 2.5 Gbps and more. To achieve this bandwidth using conventional disc drives like e.g. hard disk drives or solid state disks, it may be necessary to stream to multiple storage units in parallel. Additionally, the nature of audiovisual or AV streams involves the processing of data in real-time, i.e. with limited delays in a data path.
- Known systems fulfill these requirements by separating data and control path functions at a higher level. In other words, known systems separate data processing, maintenance and/or user interface functions in order to free the data path from processing as much control information as possible. However, known approaches still require the file system/addressing tasks to be implemented either completely in the data path or to be shared with the control path.
- One known approach presupposes the data to be mapped into the address space of a control processor, with the control processor setting up all I/O operations to/from storage devices and to/from communication network.
- U.S. Patent Application Publication No. 20060112219 to Chawla, et al. purports to disclose a modular data storage system with a control path and a data path.
- the storage system includes three modular components linked and adapted for independent removal and insertion within the modular data storage system.
- a service processor is positioned in the control path
- a data services platform is positioned in the data path and the control path
- a storage array controller is positioned in the data path and the control path.
- the data services platform has a host interface interfacing with storage application hosts and includes a control path block linked to the service processor.
- the platform includes a data path block including data path functions that may be functions partitioned for performance only by the data services platform.
- the storage array controller includes a control path block linked to the service processor and including control interfaces.
- the controller includes a data path block including data path functions.
- U.S. Patent No. 5,802,366 to Row et al. purports to disclose a file server architecture that comprises as separate processors a network controller unit, a file controller unit and a storage processor unit. These units incorporate their own processors, and operate in parallel with a local Unix host processor. All networks are connected to the network controller unit, which performs all protocol processing up through the NFS layer.
- the virtual file system is implemented in the file control unit, and the storage processor provides high-speed multiplexed access to an array of mass storage devices.
- the file controller unit control file information caching through its own local cache buffer, and controls disk data caching through a large system memory which is accessible on a bus by any of the processors.
- U.S. Patent No. 5,555,390 to Judd, et al. purports to disclose a data storage subsystem and method for transferring data from a storage subsystem to a connected host data processing system.
- the subsystem comprises a device controller connected to one or more direct access storage devices e.g. disk drives.
- the host data processing system issues data transfer commands to the subsystem to initiate transfer of data between the host processing system and the device or devices associated with the data storage subsystem.
- Read/write data is transferred directly from device to host via a buffer controller.
- the read command from the host data processing system specifies the data to be transferred and the start address in host memory to which the data should be sent.
- the device controller of the data storage subsystem is capable of respecifying or amending the start address specified by the host in the read command. This provides a performance benefit for split data transfers.
- the device controller of the data storage subsystem can specify the host address to which the replacement data should be sent.
- control path desirably include a high degree of redundancy or performance reserve, since their construction does not allow a good degree of predictability due to estimated worst-case delays that may occur while processing file system related tasks. This leads to inefficient utilization of hardware resources.
- An improved system and method that facilitates separation of a data path from a control path in a high-speed data transfer system is desired.
- a data storage apparatus in accordance with the present invention is recited in claim 1.
- the data storage apparatus comprises a cache that buffers data received from a data path and an array controller that multiplexes an input stream of data received from the cache.
- the data storage apparatus additionally comprises a bus driver module that is adapted to associate control information with a portion of an output stream of data received from the array controller.
- a data storage apparatus in accordance with the present invention may additionally comprise a control path that is adapted to provide the control information to the bus driver module.
- the control information may identify a sector on an array of storage devices or may more generally relate to an address on the array of storage devices.
- the stream of data may be transferred in a sequence of burst transfers, for which the bus driver module is adapted to receive a start address only for an initial one of the sequence of burst transfers.
- the bus driver may be adapted to write the portion of the output stream of data received from the array controller to an array of storage devices at locations as identified by the control information.
- a method of transferring data in accordance with the present invention is set forth in claim 6. The method comprises buffering data in a cache, delivering the data buffered in the cache to an array controller and delivering data from the array controller to a bus driver module.
- the method additionally comprises associating control information with a portion of the data received from the array controller.
- a method in accordance with the present invention may comprise writing the portion of data received from the array controller to an array of storage devices at a location that corresponds to the control information.
- the control information may identify a sector on an array of storage devices or may relate more generally to an address on an array of storage devices.
- Data received from the array controller may be transferred to an array of storage devices as a sequence of burst transfers.
- Fig. 1 is a block diagram of a data recording system in accordance with an exemplary embodiment of the present invention.
- Fig. 2 is a state diagram that is useful in explaining the operation of an exemplary embodiment of the present invention.
- Fig. 3 is a process flow diagram that shows a method in accordance with an exemplary embodiment of the present invention.
- Fig. 1 is a block diagram of a recording system having a system controller module to perform storage functionality on an array of storage units in accordance with an exemplary embodiment of the present invention.
- the recording system shown in Fig. 1 is generally referred to by the reference number 100.
- the recording system 100 includes a user interface 102, which allows a user to control the overall operation of the recording system 100 and to view information about the system status and the like.
- the user interface includes an LCD touchpad display.
- the recording system 100 includes a system controller module 104.
- the system controller module 104 is adapted to transfer data to and receive data from an HDD array 114, which comprises a plurality of individual HDDs. Transfers of data clusters to or from the disks of the HDD array 114 are initiated by setting or inscribing appropriate values into registers, indicating the cluster size, the cluster start address and the related command, like "read” or "write”.
- the system controller module 104 includes an embedded software processor system, which is shown in Fig. 1 as PPC 106.
- PPC is an acronym for Power PC.
- the PPC 106 communicates via an external control path 116 with external modules.
- Controlling tasks performed by the PPC 106 may include executing file system maintenance algorithms, communicating with external clients, controlling the user access via an LCD touchpad or other user interface or Ul.
- File system tasks may relate to, for example, an interface between logical addressing and physical addressing of the storage, access rights management, memory allocation, garbage collection, defragmentation, version management, load balancing or the like.
- data processing in accordance with an exemplary embodiment of the invention is preferably performed without overhead of file system/addressing tasks.
- the PPC 106 configures the hardware of the system controller module 104 via an internal control path 118.
- the PPC 106 configures a cache 108, an array control such as a RAID control 110 and a bus driver module 112.
- the PPC106 may be adapted to associate control information with data by providing control information to the bus driver 112 via the internal control path 118.
- the cache 108 is adapted to transfer data received via a data path 120.
- real-time data received via the data path 120 are buffered in the cache 108 before being transferred to the RAID control 110.
- the RAID control 110 then delivers the data to the bus driver module 112 to provide data streaming to the attached devices in the HDD array 114.
- the recording system 100 may additionally be adapted to stream data from the HDD array 114.
- the RAID controller 110 multiplexes incoming data streams. Multiplexing an incoming data stream is the process of separating a unitary incoming data stream into a plurality of parallel streams, each of which is intended to be written to a different storage device in an array of storage devices.
- the skilled person will appreciate that the RAID controller 110 may also be adapted to de-multiplex outgoing data streams, and optionally add redundancy for recovering data from up to two failing HDDs without data loss.
- the bus driver module 112 splits the data stream word by word into n chunks, each assigned to a corresponding DMA engine serving the attached standard ATA-HDD.
- the skilled person will appreciate that storage devices operating according to any protocol may be employed in an exemplary embodiment of the present invention. Parallel or serial ATA-HDDs are just two examples of protocols that may be employed.
- the addressing task that is the assigning of data chunks to sector addresses on storage devices, is also done in the bus driver module 112.
- An exemplary embodiment of the present invention provides an efficient method and apparatus for distributing a high-speed data stream onto multiple devices performed by a "late binding" of data to data control information via the internal control path 118.
- the expression "late binding” refers to associating data with control information such as an intended sector on an array of storage devices and/or other address information at a time just before the data is to be written to the HDD array.
- the internal control path 118 may be adapted to provide addressing information directly to the bus driver module 112 and not to the cache 108 or the RAID control 110. The idea is to associate the incoming/outgoing data with assigned clusters on storage devices at a very low level, that is, in the last controller instance that directly communicates with the single storage units or storage devices.
- the late association of data with control information simplifies the design of the recording system 100 because the control information does not need to be processed until just before the associated data is written to the HDD array 114.
- An exemplary embodiment of the present invention exploits a known principle of DMA one step further.
- DMA generally allows a processor to be liberated from the resource-consuming task of orchestrating every single memory cell access in a block data transfer. This is achieved by providing dedicated hardware that performs specific functionality.
- An exemplary embodiment of the present invention extends this concept to situations in which sequences of plural DMA accesses to contiguous memory address ranges occur.
- an exemplary embodiment of the present invention provides dedicated hardware to perform even the task of setting up, supervising and influencing such sequences of DMA calls. This frees the PPC 106 even further from brute-force routine tasks.
- An exemplary embodiment of the present invention increases the predictability of a storage system by almost completely discharging the data path 120 from performing control tasks. This results in a better utilization of hardware resources, which in turn allows renouncing or getting rid of redundant components, and enables the usage of a low-power control processor.
- Fig. 2 is a state diagram that is useful in explaining the operation of an exemplary embodiment of the present invention.
- the state diagram is generally referred to by the reference number 200.
- the state diagram 200 shows, in state-diagram format, an exemplary method of operation for a state machine that improves the performance of a data storage system.
- a state machine that operates according to Fig. 2 provides late binding of data and control information in accordance with an exemplary embodiment of the present invention.
- a system controller Prior to beginning operation, a system controller, which is denoted as Power PC or PPC in Fig. 2, is in an idle state 202.
- the PPC initiates a transfer by delivering, via registers, values for the cluster size of the next transfer, the sector start address to be used, and whether the related command is for a read operation or a write operation.
- cluster sizes may range up to 16 MB per device. These values may be provided using, for example, a serial communication bus protocol like I2C.
- the registers are denoted as "PPC registers" in Fig. 2.
- the register values are read, as shown at state 204.
- An interrupt is delivered to the system controller to signal the initialisation of the storage devices involved in the transfer.
- the interrupt signals that the PPC can read the status of the data transmission finished before or previously, and may initiate the next transfer.
- "Small" or "large” data clusters may be addressed, but read/write transfers are generally performed as 64-kB bursts per device unless the cluster size per device is smaller than 64 kB. In other words, burst size is derived and equals the cluster size with an upper limit of 64 kB.
- the bus driver module 112 may include logic to calculate the specific sector start address and/or other address information for the ATA devices. In an exemplary embodiment of the present invention, 64-kB bursts are initiated together or simultaneously for all DMA engines.
- the lower half of the state diagram 200 exhibits a loop structure, where the transfer of the cluster is performed in the form of a sequence of burst transfers.
- the cluster size and address is determined. This determination involves, for example, incrementing the previously-used address or start address by the burst length and decrementing the cluster size by the burst length.
- the system controller returns to the idle state 202. If, at state 206, the cluster size is determined to be greater than zero, as shown at state 210, this is taken as an indication that there is data still to be transferred to complete the transaction. Moreover, a new DMA burst transfer is initiated at state 212. During the DMA transaction, the state machine is in a DMA_control state 214. After the burst transfer, a new iteration occurs whenever the attached group of storage devices is ready as a whole. This is symbolically indicated as an HDD group ready state 216 in Fig. 2.
- the state machine iterates the loop by again calculating the cluster size remaining to be transferred and the address, as shown at state 206.
- streaming data is not mapped into an address space of a control processor.
- the PPC 106 shown in Fig. 1 , only delivers subsequent, consecutive or successive cluster addresses for the accessed file. For larger cluster sizes, a longer time slot is needed by hardware modules to fill the cluster with streaming data. Having large enough clusters and an appropriate data rate, periodic requests for new cluster addresses from the bus driver module 112 to the PPC 106 will come in at relatively long intervals, thus allowing the processor to operate at a very low clock frequency.
- a start address is provided by the PPC 106 to the bus driver module 112 prior to the first of a sequence of burst transfers.
- the start address for subsequent transfers is determined by the bus driver module 112, which increments the value of the initial start address.
- no additional addressing information is provided by the PPC 106 to the bus driver module 112 via the internal control path 118 for the remaining sequence of burst transfers.
- the bus driver module 112 may be adapted to notify the processor of a completion status of a previously completed data cluster transfer.
- the bus driver module 112 may be adapted to request from the PPC 106 a next start address to be used in a next cluster transfer to be performed after the current cluster transfer.
- Fig. 3 is a process flow diagram that shows a method in accordance with an exemplary embodiment of the present invention.
- the method is generally referred to by the reference number 300.
- the skilled person will appreciate that the method 300 may be desirably performed by the data storage system 100.
- An exemplary method in accordance with the present invention may, in addition, be implemented according to the state diagram 200.
- data is buffered in a cache such as the cache 108.
- data buffered in the cache is delivered to an array controller such as the RAID controller 110.
- Data is delivered from the array controller to a bus driver module such as the bus driver module 112 at step 306.
- the bus driver module associates control information with a portion of data received from the array controller. Finally, the portion of data received from the array controller is written by the bus driver module to an array of storage devices at a location that corresponds to the control information.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Memory System Of A Hierarchy Structure (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
Abstract
The present invention relates to a device and method for transferring data. An exemplary data storage device comprises a cache that buffers data received from a data path and an array controller that multiplexes an input stream of data received from the cache. The exemplary data storage device additionally comprises a bus driver module that is adapted to associate control information with a portion of an output stream of data received from the array controller. An exemplary method (300) of transferring data comprises buffering (302) data in a cache, delivering (304) the data buffered in the cache to an array controller and delivering (306) data from the array controller to a bus driver module. The exemplary method (300) additionally comprises associating (308) control information with a portion of the data received from the array controller.
Description
System and Method for Splitting Data and Data Control Information
The present invention relates to the field of mass storage solutions with multiple storage units. In particular, exemplary embodiments of the present invention relate to a Method and apparatus for splitting data and data control information on storage devices.
High-speed data recording in the workflow for digital cinematography may involve a process data rate of 2.5 Gbps and more. To achieve this bandwidth using conventional disc drives like e.g. hard disk drives or solid state disks, it may be necessary to stream to multiple storage units in parallel. Additionally, the nature of audiovisual or AV streams involves the processing of data in real-time, i.e. with limited delays in a data path. Known systems fulfill these requirements by separating data and control path functions at a higher level. In other words, known systems separate data processing, maintenance and/or user interface functions in order to free the data path from processing as much control information as possible. However, known approaches still require the file system/addressing tasks to be implemented either completely in the data path or to be shared with the control path. One known approach presupposes the data to be mapped into the address space of a control processor, with the control processor setting up all I/O operations to/from storage devices and to/from communication network.
U.S. Patent Application Publication No. 20060112219 to Chawla, et al. purports to disclose a modular data storage system with a control path and a data path. The storage system includes three modular components linked and adapted for independent removal and insertion within the modular data storage system. A service processor is positioned in the control path, a data services platform is positioned in the data path and the control path, and a storage array controller is positioned in the data path and the control path. The data services platform has a host interface interfacing with storage application hosts and includes a control path block linked to the service processor. The platform includes a data path
block including data path functions that may be functions partitioned for performance only by the data services platform. The storage array controller includes a control path block linked to the service processor and including control interfaces. The controller includes a data path block including data path functions.
U.S. Patent No. 5,802,366 to Row et al. purports to disclose a file server architecture that comprises as separate processors a network controller unit, a file controller unit and a storage processor unit. These units incorporate their own processors, and operate in parallel with a local Unix host processor. All networks are connected to the network controller unit, which performs all protocol processing up through the NFS layer. The virtual file system is implemented in the file control unit, and the storage processor provides high-speed multiplexed access to an array of mass storage devices. The file controller unit control file information caching through its own local cache buffer, and controls disk data caching through a large system memory which is accessible on a bus by any of the processors.
U.S. Patent No. 5,555,390 to Judd, et al. purports to disclose a data storage subsystem and method for transferring data from a storage subsystem to a connected host data processing system. The subsystem comprises a device controller connected to one or more direct access storage devices e.g. disk drives. The host data processing system issues data transfer commands to the subsystem to initiate transfer of data between the host processing system and the device or devices associated with the data storage subsystem. Read/write data is transferred directly from device to host via a buffer controller. For a read operation, the read command from the host data processing system specifies the data to be transferred and the start address in host memory to which the data should be sent. The device controller of the data storage subsystem is capable of respecifying or amending the start address specified by the host in the read command. This provides a performance benefit for split data transfers. In addition, if an error occurs during a read operation, the device controller of the
data storage subsystem can specify the host address to which the replacement data should be sent.
Moreover, known methods of separating a control path from a data path require either a powerful architecture, which may necessitate a complex data path, or a very fast control processor. Under real-time requirements, the data path and control path desirably include a high degree of redundancy or performance reserve, since their construction does not allow a good degree of predictability due to estimated worst-case delays that may occur while processing file system related tasks. This leads to inefficient utilization of hardware resources.
An improved system and method that facilitates separation of a data path from a control path in a high-speed data transfer system is desired.
A data storage apparatus in accordance with the present invention is recited in claim 1. The data storage apparatus comprises a cache that buffers data received from a data path and an array controller that multiplexes an input stream of data received from the cache. The data storage apparatus additionally comprises a bus driver module that is adapted to associate control information with a portion of an output stream of data received from the array controller.
A data storage apparatus in accordance with the present invention may additionally comprise a control path that is adapted to provide the control information to the bus driver module. The control information may identify a sector on an array of storage devices or may more generally relate to an address on the array of storage devices. In accordance with the present invention, the stream of data may be transferred in a sequence of burst transfers, for which the bus driver module is adapted to receive a start address only for an initial one of the sequence of burst transfers. The bus driver may be adapted to write the portion of the output stream of data received from the array controller to an array of storage devices at locations as identified by the control information.
A method of transferring data in accordance with the present invention is set forth in claim 6. The method comprises buffering data in a cache, delivering the data buffered in the cache to an array controller and delivering data from the array controller to a bus driver module. The method additionally comprises associating control information with a portion of the data received from the array controller.
In addition, a method in accordance with the present invention may comprise writing the portion of data received from the array controller to an array of storage devices at a location that corresponds to the control information. The control information may identify a sector on an array of storage devices or may relate more generally to an address on an array of storage devices. Data received from the array controller may be transferred to an array of storage devices as a sequence of burst transfers.
A preferred embodiment of the present invention is described with reference to the accompanying drawings. The preferred embodiment merely exemplifies the invention. Plural possible modifications are apparent to the skilled person. The gist and scope of the present invention is defined in the appended claims of the present application.
Fig. 1 is a block diagram of a data recording system in accordance with an exemplary embodiment of the present invention.
Fig. 2 is a state diagram that is useful in explaining the operation of an exemplary embodiment of the present invention.
Fig. 3 is a process flow diagram that shows a method in accordance with an exemplary embodiment of the present invention.
Fig. 1 is a block diagram of a recording system having a system controller module to perform storage functionality on an array of storage units in accordance with an exemplary embodiment of the present invention. The recording system shown in Fig. 1 is generally referred to by the reference number 100. The recording
system 100 includes a user interface 102, which allows a user to control the overall operation of the recording system 100 and to view information about the system status and the like. In one exemplary embodiment of the present invention, the user interface includes an LCD touchpad display.
The recording system 100 includes a system controller module 104. The system controller module 104 is adapted to transfer data to and receive data from an HDD array 114, which comprises a plurality of individual HDDs. Transfers of data clusters to or from the disks of the HDD array 114 are initiated by setting or inscribing appropriate values into registers, indicating the cluster size, the cluster start address and the related command, like "read" or "write".
The system controller module 104 includes an embedded software processor system, which is shown in Fig. 1 as PPC 106. As used herein, PPC is an acronym for Power PC. The PPC 106 communicates via an external control path 116 with external modules. Controlling tasks performed by the PPC 106 may include executing file system maintenance algorithms, communicating with external clients, controlling the user access via an LCD touchpad or other user interface or Ul. File system tasks may relate to, for example, an interface between logical addressing and physical addressing of the storage, access rights management, memory allocation, garbage collection, defragmentation, version management, load balancing or the like. As fully set forth below, data processing in accordance with an exemplary embodiment of the invention is preferably performed without overhead of file system/addressing tasks.
Additionally, the PPC 106 configures the hardware of the system controller module 104 via an internal control path 118. In the exemplary embodiment shown in Fig. 1 , the PPC 106 configures a cache 108, an array control such as a RAID control 110 and a bus driver module 112. As set forth below, the PPC106 may be adapted to associate control information with data by providing control information to the bus driver 112 via the internal control path 118.
In the exemplary embodiment shown in Fig. 1 , the cache 108 is adapted to transfer data received via a data path 120. Moreover, real-time data received via the data path 120 are buffered in the cache 108 before being transferred to the RAID control 110. The RAID control 110 then delivers the data to the bus driver module 112 to provide data streaming to the attached devices in the HDD array 114. The skilled person will appreciate that the recording system 100 may additionally be adapted to stream data from the HDD array 114.
The RAID controller 110 multiplexes incoming data streams. Multiplexing an incoming data stream is the process of separating a unitary incoming data stream into a plurality of parallel streams, each of which is intended to be written to a different storage device in an array of storage devices. The skilled person will appreciate that the RAID controller 110 may also be adapted to de-multiplex outgoing data streams, and optionally add redundancy for recovering data from up to two failing HDDs without data loss.
The bus driver module 112 splits the data stream word by word into n chunks, each assigned to a corresponding DMA engine serving the attached standard ATA-HDD. The skilled person will appreciate that storage devices operating according to any protocol may be employed in an exemplary embodiment of the present invention. Parallel or serial ATA-HDDs are just two examples of protocols that may be employed. The addressing task, that is the assigning of data chunks to sector addresses on storage devices, is also done in the bus driver module 112.
An exemplary embodiment of the present invention provides an efficient method and apparatus for distributing a high-speed data stream onto multiple devices performed by a "late binding" of data to data control information via the internal control path 118. The expression "late binding" refers to associating data with control information such as an intended sector on an array of storage devices and/or other address information at a time just before the data is to be written to the HDD array. Moreover, the internal control path 118 may be adapted to provide addressing information directly to the bus driver module 112 and not to
the cache 108 or the RAID control 110. The idea is to associate the incoming/outgoing data with assigned clusters on storage devices at a very low level, that is, in the last controller instance that directly communicates with the single storage units or storage devices. In this manner, it is possible to achieve high-speed data recording performance while using a very low controlling overhead. In particular, the late association of data with control information simplifies the design of the recording system 100 because the control information does not need to be processed until just before the associated data is written to the HDD array 114.
An exemplary embodiment of the present invention exploits a known principle of DMA one step further. In particular, DMA generally allows a processor to be liberated from the resource-consuming task of orchestrating every single memory cell access in a block data transfer. This is achieved by providing dedicated hardware that performs specific functionality. An exemplary embodiment of the present invention extends this concept to situations in which sequences of plural DMA accesses to contiguous memory address ranges occur. Moreover, an exemplary embodiment of the present invention provides dedicated hardware to perform even the task of setting up, supervising and influencing such sequences of DMA calls. This frees the PPC 106 even further from brute-force routine tasks.
An exemplary embodiment of the present invention increases the predictability of a storage system by almost completely discharging the data path 120 from performing control tasks. This results in a better utilization of hardware resources, which in turn allows renouncing or getting rid of redundant components, and enables the usage of a low-power control processor.
Fig. 2 is a state diagram that is useful in explaining the operation of an exemplary embodiment of the present invention. The state diagram is generally referred to by the reference number 200. The state diagram 200 shows, in state-diagram format, an exemplary method of operation for a state machine that improves the performance of a data storage system. In particular, a state machine that
operates according to Fig. 2 provides late binding of data and control information in accordance with an exemplary embodiment of the present invention.
Prior to beginning operation, a system controller, which is denoted as Power PC or PPC in Fig. 2, is in an idle state 202. The PPC initiates a transfer by delivering, via registers, values for the cluster size of the next transfer, the sector start address to be used, and whether the related command is for a read operation or a write operation. In one exemplary embodiment of the present invention, cluster sizes may range up to 16 MB per device. These values may be provided using, for example, a serial communication bus protocol like I2C. The registers are denoted as "PPC registers" in Fig. 2. The register values are read, as shown at state 204. An interrupt is delivered to the system controller to signal the initialisation of the storage devices involved in the transfer.
The interrupt signals that the PPC can read the status of the data transmission finished before or previously, and may initiate the next transfer. "Small" or "large" data clusters may be addressed, but read/write transfers are generally performed as 64-kB bursts per device unless the cluster size per device is smaller than 64 kB. In other words, burst size is derived and equals the cluster size with an upper limit of 64 kB. The bus driver module 112 may include logic to calculate the specific sector start address and/or other address information for the ATA devices. In an exemplary embodiment of the present invention, 64-kB bursts are initiated together or simultaneously for all DMA engines.
The lower half of the state diagram 200 exhibits a loop structure, where the transfer of the cluster is performed in the form of a sequence of burst transfers. At state 206, the cluster size and address is determined. This determination involves, for example, incrementing the previously-used address or start address by the burst length and decrementing the cluster size by the burst length.
If the cluster size is determined zero, as shown at state 208, this indicates that the DMA transaction has completed. Accordingly, the system controller returns to the idle state 202.
If, at state 206, the cluster size is determined to be greater than zero, as shown at state 210, this is taken as an indication that there is data still to be transferred to complete the transaction. Moreover, a new DMA burst transfer is initiated at state 212. During the DMA transaction, the state machine is in a DMA_control state 214. After the burst transfer, a new iteration occurs whenever the attached group of storage devices is ready as a whole. This is symbolically indicated as an HDD group ready state 216 in Fig. 2. The skilled person will appreciate that the idea of waiting until a group of storage devices is ready is applicable not only to HDD storage devices, but any suitable storage device that may be employed in a digital storage system such as the recording system 100. When the associated storage devices are ready, the state machine iterates the loop by again calculating the cluster size remaining to be transferred and the address, as shown at state 206.
In an exemplary embodiment of the present invention, streaming data is not mapped into an address space of a control processor. Instead, the PPC 106, shown in Fig. 1 , only delivers subsequent, consecutive or successive cluster addresses for the accessed file. For larger cluster sizes, a longer time slot is needed by hardware modules to fill the cluster with streaming data. Having large enough clusters and an appropriate data rate, periodic requests for new cluster addresses from the bus driver module 112 to the PPC 106 will come in at relatively long intervals, thus allowing the processor to operate at a very low clock frequency.
In one exemplary embodiment of the present invention, a start address is provided by the PPC 106 to the bus driver module 112 prior to the first of a sequence of burst transfers. The start address for subsequent transfers is determined by the bus driver module 112, which increments the value of the initial start address. Moreover, no additional addressing information is provided by the PPC 106 to the bus driver module 112 via the internal control path 118 for the remaining sequence of burst transfers. After receiving a request for a data cluster transfer, the bus driver module 112 may be adapted to notify the
processor of a completion status of a previously completed data cluster transfer. In an exemplary embodiment of the present invention, after setting up the first DMA access, the bus driver module 112 may be adapted to request from the PPC 106 a next start address to be used in a next cluster transfer to be performed after the current cluster transfer.
Fig. 3 is a process flow diagram that shows a method in accordance with an exemplary embodiment of the present invention. The method is generally referred to by the reference number 300. The skilled person will appreciate that the method 300 may be desirably performed by the data storage system 100. An exemplary method in accordance with the present invention may, in addition, be implemented according to the state diagram 200.
At step 302, data is buffered in a cache such as the cache 108. At step 304, data buffered in the cache is delivered to an array controller such as the RAID controller 110. Data is delivered from the array controller to a bus driver module such as the bus driver module 112 at step 306.
At step 308, the bus driver module associates control information with a portion of data received from the array controller. Finally, the portion of data received from the array controller is written by the bus driver module to an array of storage devices at a location that corresponds to the control information.
The skilled person will appreciate that combining any of the above-recited features of the present invention together may be desirable.
Claims
1. Data storage device (100), comprising:
- a cache (108) that buffers data received from a data path (120);
- an array controller (110) that multiplexes an input stream of data received from the cache (108); and
- a bus driver module (112) that is adapted to associate control information with a portion of an output stream of data received from the array controller (110).
2. Data storage device (100) according to claim 1 , comprising a control path (118) that is adapted to provide the control information to the bus driver module (112).
3. Data storage device (100) according to claims 1 or 2, wherein the control information identifies a sector on an array of storage devices (114).
4. Data storage device (100) according to claims 1 or 2, wherein the control information relates to an address on an array of storage devices (114).
5. Data storage device (100) according to any preceding claim, wherein the bus driver module (112) is adapted to transfer the output stream of data in a sequence of burst transfers, and wherein the bus driver module (112) is adapted to receive a start address only for an initial one of the sequence of burst transfers.
6. Data storage device (100) according to any preceding claim, wherein the bus driver (112) is adapted to write the portion of the output stream of data received from the array controller (110) to an array of storage devices at locations as identified by the control information.
7. Method (300) of transferring data, comprising:
- buffering (302) data in a cache (108); - delivering (304) the data buffered in the cache (108) to an array controller (110);
- delivering (306) data from the array controller (110) to a bus driver module (112); and - associating (308) control information with a portion of the data received from the array controller (110).
8. Method (300) of transferring data according to claim 7, comprising writing (310) the portion of data received from the array controller (110) to an array of storage devices at locations as identified by the control information.
9. Method (300) of transferring data according to claims 7 or 8, wherein the control information identifies a sector on an array of storage devices (114).
10. Method (300) of transferring data according to claims 7 or 8, wherein the control information relates to an address on an array of storage devices (114).
11. Method (300) of transferring data according to one of claims 7 to 10, comprising transferring the data received from the array controller (110) to an array of storage devices (114) as a sequence of burst transfers.
12. Method (300) of transferring data according to claim 11 , comprising providing a start address only for an initial one of the sequence of burst transfers.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP07116321 | 2007-09-13 | ||
EP07116321.6 | 2007-09-13 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2009033971A1 true WO2009033971A1 (en) | 2009-03-19 |
Family
ID=40001386
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2008/061464 WO2009033971A1 (en) | 2007-09-13 | 2008-09-01 | System and method for splitting data and data control information |
Country Status (2)
Country | Link |
---|---|
TW (1) | TW200912728A (en) |
WO (1) | WO2009033971A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9948615B1 (en) * | 2015-03-16 | 2018-04-17 | Pure Storage, Inc. | Increased storage unit encryption based on loss of trust |
CN115496114A (en) * | 2022-11-18 | 2022-12-20 | 成都戎星科技有限公司 | TDMA burst length estimation method based on K-means clustering |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9235798B2 (en) * | 2012-07-18 | 2016-01-12 | Micron Technology, Inc. | Methods and systems for handling data received by a state machine engine |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0858025A2 (en) * | 1997-02-03 | 1998-08-12 | Matsushita Electric Industrial Co., Ltd. | Data recorder and method of access to data recorder |
WO1999026150A1 (en) * | 1997-11-14 | 1999-05-27 | 3Ware, Inc. | High-performance architecture for disk array controller |
US6349357B1 (en) * | 1999-03-04 | 2002-02-19 | Sun Microsystems, Inc. | Storage architecture providing scalable performance through independent control and data transfer paths |
US20040128444A1 (en) * | 2002-12-24 | 2004-07-01 | Sung-Hoon Baek | Method for storing data in disk array based on block division and method for controlling input/output of disk array by using the same |
-
2008
- 2008-08-15 TW TW97131049A patent/TW200912728A/en unknown
- 2008-09-01 WO PCT/EP2008/061464 patent/WO2009033971A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0858025A2 (en) * | 1997-02-03 | 1998-08-12 | Matsushita Electric Industrial Co., Ltd. | Data recorder and method of access to data recorder |
WO1999026150A1 (en) * | 1997-11-14 | 1999-05-27 | 3Ware, Inc. | High-performance architecture for disk array controller |
US6349357B1 (en) * | 1999-03-04 | 2002-02-19 | Sun Microsystems, Inc. | Storage architecture providing scalable performance through independent control and data transfer paths |
US20040128444A1 (en) * | 2002-12-24 | 2004-07-01 | Sung-Hoon Baek | Method for storing data in disk array based on block division and method for controlling input/output of disk array by using the same |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9948615B1 (en) * | 2015-03-16 | 2018-04-17 | Pure Storage, Inc. | Increased storage unit encryption based on loss of trust |
CN115496114A (en) * | 2022-11-18 | 2022-12-20 | 成都戎星科技有限公司 | TDMA burst length estimation method based on K-means clustering |
CN115496114B (en) * | 2022-11-18 | 2023-04-07 | 成都戎星科技有限公司 | TDMA burst length estimation method based on K-means clustering |
Also Published As
Publication number | Publication date |
---|---|
TW200912728A (en) | 2009-03-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10387202B2 (en) | Quality of service implementation in a networked storage system with hierarchical schedulers | |
US7162550B2 (en) | Method, system, and program for managing requests to an Input/Output device | |
CN104657316B (en) | Server | |
US7894288B2 (en) | Parallel data storage system | |
EP1896965B1 (en) | Dma descriptor queue read and cache write pointer arrangement | |
US8868809B2 (en) | Interrupt queuing in a media controller architecture | |
JP7010598B2 (en) | QoS-aware I/O management method, management system, and management device for a PCIe storage system with reconfigurable multi-ports - Patents.com | |
US8639898B2 (en) | Storage apparatus and data copy method | |
US20060136654A1 (en) | Method and computer program product to increase I/O write performance in a redundant array | |
US20050235072A1 (en) | Data storage controller | |
WO2007005702A2 (en) | Multi-threaded transmit transport engine for storage devices | |
CN111722786A (en) | Storage system based on NVMe equipment | |
JP2005512227A (en) | Receive data from multiple interleaved simultaneous transactions in FIFO memory | |
US8078798B2 (en) | Managing first level storage in a multi-host environment | |
US20110082950A1 (en) | Computer system and computer system input/output method | |
EP4057150A1 (en) | Systems, methods, and devices for data storage with specified data transfer rate | |
US11029847B2 (en) | Method and system for shared direct access storage | |
US20040111532A1 (en) | Method, system, and program for adding operations to structures | |
US6092140A (en) | Low latency bridging between high speed bus networks | |
US11080192B2 (en) | Storage system and storage control method | |
WO2009033971A1 (en) | System and method for splitting data and data control information | |
US7809068B2 (en) | Integrated circuit capable of independently operating a plurality of communication channels | |
EP1546855A2 (en) | Method, system, and program for returning data to read requests received over a bus | |
CN114415959B (en) | SATA disk dynamic accelerated access method and device | |
CN114662162B (en) | Multi-algorithm-core high-performance SR-IOV encryption and decryption system and method for realizing dynamic VF distribution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 08803447 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 08803447 Country of ref document: EP Kind code of ref document: A1 |