CN1997033B - A method and system for network storage - Google Patents
A method and system for network storage Download PDFInfo
- Publication number
- CN1997033B CN1997033B CN2006101665830A CN200610166583A CN1997033B CN 1997033 B CN1997033 B CN 1997033B CN 2006101665830 A CN2006101665830 A CN 2006101665830A CN 200610166583 A CN200610166583 A CN 200610166583A CN 1997033 B CN1997033 B CN 1997033B
- Authority
- CN
- China
- Prior art keywords
- starter
- data
- object machine
- command
- vscsi
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 24
- 239000003999 initiator Substances 0.000 claims abstract description 74
- 230000004044 response Effects 0.000 claims abstract description 44
- 230000008569 process Effects 0.000 claims abstract description 15
- 238000005516 engineering process Methods 0.000 claims abstract description 4
- 239000007858 starting material Substances 0.000 claims description 52
- 238000012545 processing Methods 0.000 claims description 42
- 239000000872 buffer Substances 0.000 claims description 31
- 230000005540 biological transmission Effects 0.000 claims description 23
- 238000007726 management method Methods 0.000 claims description 22
- 230000003993 interaction Effects 0.000 claims description 5
- 239000000284 extract Substances 0.000 claims description 4
- 238000012795 verification Methods 0.000 claims description 3
- 238000002360 preparation method Methods 0.000 claims description 2
- 230000003139 buffering effect Effects 0.000 claims 8
- 230000015572 biosynthetic process Effects 0.000 claims 5
- 238000004364 calculation method Methods 0.000 claims 3
- 238000002203 pretreatment Methods 0.000 claims 3
- RTZKZFJDLAIYFH-UHFFFAOYSA-N Diethyl ether Chemical compound CCOCC RTZKZFJDLAIYFH-UHFFFAOYSA-N 0.000 claims 2
- 238000006243 chemical reaction Methods 0.000 claims 1
- 238000005538 encapsulation Methods 0.000 claims 1
- 239000013307 optical fiber Substances 0.000 claims 1
- 238000004891 communication Methods 0.000 abstract description 6
- 230000006870 function Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 239000000203 mixture Substances 0.000 description 2
- 206010017577 Gait disturbance Diseases 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
Images
Landscapes
- Storage Device Security (AREA)
Abstract
一种用于网络存储的协议及其系统,属于计算机信息存储技术领域,目的在于克服现有网络存储协议的不足,构造一种可满足价格、带宽、部署复杂性等需求的网络存储协议,并减少服务器负载、提高I/O服务性能。本发明的协议以VI为基础,包括启动器进程和目标器进程;本发明的系统,包括一个启动器和N个目标器,启动器、目标器通过VI网络接口、VI连接通道连入网络;启动器和目标器间通过一个或多个VI连接通道传输数据。本发明利用VI取代传统TCP/IP协议作为构建存储协议的通信基础,缩短了网络存储数据的关键路径,提高了物理网络带宽的利用率和网络I/O的响应速度,有效解决了网络带宽、存储访问速度、互操作性等网络存储问题。
A protocol and system for network storage, belonging to the field of computer information storage technology, aims to overcome the shortcomings of existing network storage protocols, construct a network storage protocol that can meet the requirements of price, bandwidth, deployment complexity, etc., and Reduce server load and improve I/O service performance. The protocol of the present invention is based on VI, including initiator process and target device process; the system of the present invention includes an initiator and N target devices, and the initiator and target devices are connected to the network through VI network interface and VI connection channel; Data is transferred between the initiator and target through one or more VI connection channels. The present invention uses VI to replace the traditional TCP/IP protocol as the communication basis for constructing the storage protocol, shortens the critical path of network storage data, improves the utilization rate of physical network bandwidth and the response speed of network I/O, and effectively solves the problems of network bandwidth, Network storage issues such as storage access speed and interoperability.
Description
技术领域technical field
本发明属于计算机信息存储技术领域,具体涉及一种用于网络存储的协议及其系统。The invention belongs to the technical field of computer information storage, and in particular relates to a protocol for network storage and a system thereof.
背景技术Background technique
随着网络的普及和现代信息数据量的激增,很多重要的计算机应用领域,特别是存储密集型应用,对信息存储系统的容量、带宽、I/O响应时间、可扩展性等性能技术指标提出了更高的要求。为此,人们提出并实现了附网存储系统(NAS)、存储区域网(SAN)等各种网络存储体系结构,从存储体系的角度提高了存储系统的性能。With the popularization of the network and the surge in the amount of modern information data, many important computer application fields, especially storage-intensive applications, put forward requirements for performance technical indicators such as capacity, bandwidth, I/O response time, and scalability of information storage systems. higher requirements. For this reason, various network storage architectures such as Network Attached Storage System (NAS) and Storage Area Network (SAN) have been proposed and realized, which improve the performance of storage systems from the perspective of storage systems.
而组建一个网络存储系统的基础——存储协议,直接影响到了这个存储系统的性能。当今有众多存储协议可以选择,包括SCSI Serial协议,Fibre Channel协议以及iSCSI协议等。在当今工业界,Firbre Channel是构建高性能存储区域网(SAN)的首要选择,但其高昂的价格和部署复杂性却并不适合于中低端用户。另一方面,对于普及型网络以太网而言,随着近来千兆甚至万兆等技术的发展,构建基于以太网的大型存储系统显得愈来愈诱人和可行。传统构建以太网网络存储系统的方法,是使用基于TCP/IP的存储协议,如iSCSI协议,将SCSI命令以及数据封装进TCP/IP包进行传输而实现。虽然TCP/IP技术已经相当成熟,其流控制和拥塞控制等功能也为数据传输带来了极大的便利,但在局域网环境下,当想获得更高的传输性能时,TCP/IP协议的复杂性成为了绊脚石。The basis of building a network storage system - the storage protocol, directly affects the performance of the storage system. Today there are many storage protocols to choose from, including SCSI Serial protocol, Fiber Channel protocol, and iSCSI protocol. In today's industry, Firbre Channel is the first choice to build a high-performance storage area network (SAN), but its high price and deployment complexity are not suitable for low-end users. On the other hand, for popular network Ethernet, with the recent development of technologies such as Gigabit or even 10 Gigabit, it becomes more and more attractive and feasible to construct a large-scale storage system based on Ethernet. The traditional method of building an Ethernet network storage system is to use a TCP/IP-based storage protocol, such as the iSCSI protocol, to encapsulate SCSI commands and data into TCP/IP packets for transmission. Although the TCP/IP technology is quite mature, and its functions such as flow control and congestion control have brought great convenience to data transmission, but in the LAN environment, when you want to obtain higher transmission performance, the TCP/IP protocol Complexity becomes a stumbling block.
因此,有必要设计一种基于以太网的全新的存储协议,充分利用以太网的廉价和普及优势,同时绕开复杂的TCP/IP协议,使构建在以太网上的存储系统在保持价格优势的同时,具有更高的带宽和更快的I/O响应速度。虚拟接口(VI)是用于集群系统的高带宽低延迟通信机制,主要思想是:给每个用户进程提供一个受保护的、直接可访问的虚拟接口,以节约传统网络模式中系统处理的开销。每个虚拟接口代表一个通信端点,两个虚拟接口可逻辑上连接起来实现点到点双向数据传输。使用VI代替TCP/IP作为通信协议,可以获得更高的吞吐率,并且保持低开销,在此基础上设计实现的新型存储协议vSCSI能使存储系统获得更好的性能。Therefore, it is necessary to design a new storage protocol based on Ethernet, make full use of the cheap and popular advantages of Ethernet, and bypass the complicated TCP/IP protocol at the same time, so that the storage system built on Ethernet can maintain the price advantage , with higher bandwidth and faster I/O response speed. Virtual interface (VI) is a high-bandwidth and low-latency communication mechanism for cluster systems. The main idea is to provide each user process with a protected and directly accessible virtual interface to save system processing overhead in traditional network models. . Each virtual interface represents a communication endpoint, and two virtual interfaces can be logically connected to realize point-to-point bidirectional data transmission. Using VI instead of TCP/IP as a communication protocol can achieve higher throughput and keep low overhead. On this basis, the new storage protocol vSCSI designed and implemented can make the storage system achieve better performance.
发明内容Contents of the invention
本发明提供一种用于网络存储的协议及其系统,目的在于克服现有网络存储协议在部署网络存储系统时价格和性能的不足,构造一种可满足价格、带宽、部署复杂性等需求的基于虚拟接口的网络存储协议,并减少服务器负载、提高I/O服务性能。The present invention provides a protocol for network storage and its system, aiming at overcoming the price and performance deficiencies of existing network storage protocols when deploying network storage systems, and constructing a network storage protocol that can meet the requirements of price, bandwidth, deployment complexity, etc. A network storage protocol based on a virtual interface, which reduces server load and improves I/O service performance.
本发明的一种用于网络存储的协议,包括启动器进程和目标器进程,A protocol for network storage of the present invention includes an initiator process and a target process,
(1)启动器进程顺序包括:(1) The launcher process sequence includes:
(1.1)启动器初始化步骤,启动器根据用户指定模式选择加载SCSI适配器驱动模块或vSCSI API模块,完成相关初始化;(1.1) Initiator initialization step, the initiator selects and loads the SCSI adapter driver module or vSCSI API module according to the mode specified by the user, and completes the relevant initialization;
(1.2)启动器发起登入步骤,启动器调用连接管理模块向用户指定的一个或多个目标器发送认证信息,请求登入;(1.2) The initiator initiates a login step, and the initiator invokes the connection management module to send authentication information to one or more targets designated by the user to request login;
(1.3)启动器等待认证步骤,启动器等待来自目标器的认证响应,若在时间阈值内等到认证响应,顺序进行,否则转入认证超时处理步骤;(1.3) The initiator waits for the authentication step, the initiator waits for the authentication response from the target, if the authentication response is received within the time threshold, proceed sequentially, otherwise turn to the authentication timeout processing step;
(1.4)VI连接通道建立步骤,启动器和一个或多个目标器间建立一个或多个VI连接通道,注册对应VI发送/接收缓冲区、关联发送/接收描述符;(1.4) VI connection channel establishment step, establish one or more VI connection channels between the initiator and one or more targets, register corresponding VI send/receive buffers, and associate send/receive descriptors;
(1.5)启动器等待命令步骤,连接建立完毕后,启动器等待接收来自系统或上层应用的SCSI命令,接收到命令,顺序进行;(1.5) The initiator waits for the command step. After the connection is established, the initiator waits to receive the SCSI command from the system or the upper-layer application. After receiving the command, proceed in sequence;
(1.6)启动器解析命令步骤,启动器解析当前SCSI命令,根据用户指定的数据分布规则或策略,以及当前SCSI命令的类型生成一个或多个vSCSI请求;若是SCSI读命令,进行步骤(1.7),否则转步骤(1.8);(1.6) The initiator parses the command step, the initiator parses the current SCSI command, generates one or more vSCSI requests according to the data distribution rule or strategy specified by the user, and the type of the current SCSI command; if it is a SCSI read command, proceed to step (1.7) , otherwise go to step (1.8);
(1.7)启动器读命令预处理步骤,启动器根据用户指定的数据分布规则或策略中使用到的一个或多个目标器,根据步骤(1.6)中生成的各vSCSI读请求中数据量大小,对各目标器对应的VI接收缓冲区执行PostRecv操作,准备好各VI接收队列,顺序进行;(1.7) Initiator read command preprocessing step, the initiator uses one or more targets used in the data distribution rule or strategy specified by the user, and according to the size of data in each vSCSI read request generated in step (1.6), Execute the PostRecv operation on the VI receiving buffer corresponding to each target, prepare each VI receiving queue, and proceed sequentially;
(1.8)启动器发送请求步骤,启动器将步骤(1.6)生成的针对目标器的一个或多个vSCSI请求并发执行PostSend操作,向目标器发送vSCSI请求;若发送的是写请求,进行步骤(1.9),否则转步骤(1.10);(1.8) The initiator sends a request step, and the initiator concurrently executes the PostSend operation for one or more vSCSI requests generated by the step (1.6) for the target device, and sends a vSCSI request to the target device; if what is sent is a write request, proceed to the step ( 1.9), otherwise go to step (1.10);
(1.9)启动器写命令预处理步骤,启动器按照使用的规则或策略对SCSI命令request_buffer中待写数据进行运算定位,确定待写数据与各目标器的对应关系,然后按照用户指定的数据加密级别对以上确定的各目标器对应数据分别进行加密运算处理,最后将加密处理后的数据并行拷贝到各自对应的VI发送缓冲区队列,等待发送,顺序进行;(1.9) Initiator write command preprocessing step, the initiator calculates and locates the data to be written in the SCSI command request_buffer according to the rules or strategies used, determines the corresponding relationship between the data to be written and each target device, and then encrypts the data according to the user-specified data The level performs encryption operation processing on the corresponding data of each target device determined above, and finally copies the encrypted data to the respective corresponding VI sending buffer queues in parallel, waiting for sending, and proceeding sequentially;
(1.10)启动器等待响应步骤,启动器等待来自目标器的响应,接收到响应,则根据不同的响应类型,进入不同处理例程:读命令的响应进入步骤(1.11),写命令的响应进入步骤(1.12),状态查询类命令的响应进入步骤(1.13);若在时间阈值内未接收到响应,则转入响应超时处理步骤;(1.10) The initiator waits for a response step. The initiator waits for a response from the target. After receiving the response, it enters different processing routines according to different response types: the response to the read command enters step (1.11), and the response to the write command enters Step (1.12), the response of the status query class command enters step (1.13); if no response is received within the time threshold, then proceed to the response overtime processing step;
(1.11)启动器读命令处理步骤,启动器接收来自一个或多个目标器的数据,并按照用户指定的数据加密级别对各VI接收缓冲区中数据进行解密运算处理,然后按照使用的规则或策略对SCSI命令request_buffer中待回填空间进行运算定位,确定来自各目标器的数据与其对应关系,最后将解密处理后的数据拷贝到以上计算出的SCSI命令的request_buffer对应位置中;回填完毕,转步骤(1.14);(1.11) Initiator read command processing step, the initiator receives data from one or more targets, and performs decryption operation on the data in the receiving buffer of each VI according to the data encryption level specified by the user, and then according to the used rules or The strategy calculates and locates the space to be backfilled in the SCSI command request_buffer, determines the data from each target device and its corresponding relationship, and finally copies the decrypted data to the corresponding position of the SCSI command calculated above in request_buffer; after backfilling, go to step (1.14);
(1.12)启动器写命令处理步骤,启动器接收到来自一个或多个目标器的响应,对各待写数据队列并发执行PostSend操作,将待写数据发送到对应的目标器;数据发送完毕,转步骤(1.14);(1.12) Initiator write command processing step, the initiator receives responses from one or more target devices, executes the PostSend operation concurrently on each data queue to be written, and sends the data to be written to the corresponding target device; after the data is sent, Go to step (1.14);
(1.13)启动器状态查询类命令处理步骤,启动器接收到对应一个或多个目标器的一个或多个vSCSI状态查询响应,提取并向系统或上层应用返回SCSI响应信息;反馈完毕,转步骤(1.14);(1.13) Initiator status query command processing steps, the initiator receives one or more vSCSI status query responses corresponding to one or more targets, extracts and returns SCSI response information to the system or upper-layer application; after the feedback is completed, go to the step (1.14);
(1.14)启动器请求处理完毕步骤,启动器执行完毕当前SCSI命令,转步骤(1.5),等待执行下一个SCSI命令。(1.14) In the step of completing the processing of the initiator request, the initiator finishes executing the current SCSI command, and turns to step (1.5), waiting for the execution of the next SCSI command.
(2)目标器进程顺序包括:(2) The target process sequence includes:
(2.1)目标器初始化步骤,目标器启动设备服务线程初始化,初始化完毕后连接管理模块;(2.1) The target device initialization step, the target device starts the device service thread initialization, and connects the management module after initialization;
(2.2)目标器等待登入步骤,目标器等待接收来自启动器的登入认证请求;(2.2) The target device waits for the login step, and the target device waits to receive a login authentication request from the initiator;
(2.3)目标器接收登入步骤,目标器接收到登入请求,对其进行身份验证,非法登入请求转入非法处理步骤,通过验证的合法请求接受其登入,向启动器发送认证响应;(2.3) The target device receives the login step, the target device receives the login request, performs identity verification to it, and the illegal login request is transferred to the illegal processing step, and the legal request passed through verification accepts its login, and sends an authentication response to the initiator;
(2.4)目标器等待命令步骤,目标器等待接收来自启动器的vSCSI命令;(2.4) The target device waits for a command step, and the target device waits to receive a vSCSI command from the initiator;
(2.5)目标器解析命令步骤,目标器接收到vSCSI命令,对其进行解析,根据当前vSCSI命令类型的不同,进行不同处理:vSCSI读命令转步骤(2.6),写命令转步骤(2.7),状态查询类命令转步骤(2.10);(2.5) The target device parses the command step, the target device receives the vSCSI command, parses it, and performs different processing according to the difference of the current vSCSI command type: vSCSI read command transfer step (2.6), write command transfer step (2.7), Status query command to step (2.10);
(2.6)目标器读命令处理步骤,目标器在读写增速模块控制下,将数据从物理存储设备或Cache拷贝到VI发送缓冲区队列,并对其执行PostSend操作,将数据发送至启动器;数据全部发送完毕,转步骤(2.11);(2.6) Target device read command processing steps. Under the control of the read-write speed-up module, the target device copies data from the physical storage device or Cache to the VI sending buffer queue, and executes the PostSend operation on it, and sends the data to the initiator ;All the data has been sent, go to step (2.11);
(2.7)目标器写命令预处理步骤,目标器根据请求大小执行PostRecv操作,准备好VI接收队列,生成并调用PostSend操作向启动器发出ready(写准备好)响应,顺序进行;(2.7) The target device write command preprocessing step, the target device executes the PostRecv operation according to the request size, prepares the VI receiving queue, generates and calls the PostSend operation to send a ready (ready for writing) response to the initiator, and proceeds sequentially;
(2.8)目标器等待数据步骤,目标器等待来自启动器的数据,若在时间阈值内接收到数据,顺序进行;否则转入接收超时处理步骤,(2.8) The target device waits for the data step, the target device waits for the data from the initiator, if the data is received within the time threshold, the sequence is carried out; otherwise, the receiving timeout processing step is transferred to,
(2.9)目标器写命令处理步骤,目标器接收来自启动器的数据,在读写增速模块控制下,将数据从VI接收缓冲区直接写到物理存储设备或Cache;数据全部接收完毕,转步骤(2.11);(2.9) The target device write command processing step, the target device receives the data from the initiator, and under the control of the read-write speed-up module, writes the data directly from the VI receiving buffer to the physical storage device or Cache; step (2.11);
(2.10)目标器状态查询类命令处理步骤,目标器提取vSCSI命令中封装的SCSI查询命令,按照SCSI协议规范执行查询操作,并将查询所得的反馈信息封装进vSCSI查询响应中,调用PostSend操作发送至启动器;查询响应发送完毕,转步骤(2.11);(2.10) The processing steps of the target device state query class command, the target device extracts the SCSI query command encapsulated in the vSCSI command, executes the query operation according to the SCSI protocol specification, and encapsulates the feedback information obtained from the query into the vSCSI query response, and calls the PostSend operation to send To the initiator; after the query response is sent, go to step (2.11);
(2.11)目标器请求处理完毕步骤,目标器将当前vSCSI命令成功执行完毕,转入步骤(2.4),等待执行下一个vSCSI命令;(2.11) The target device requests to complete the processing step, and the target device completes the successful execution of the current vSCSI command, and then proceeds to step (2.4), waiting to execute the next vSCSI command;
所述的一种用于网络存储的协议,其特征在于:(1)所述认证超时处理步骤,负责向系统或上层应用返回错误状态,并终止启动器进程;(2)所述响应超时处理步骤,负责向系统或上层应用报告当前SCSI命令执行出错,并终止当前SCSI命令的执行,转入启动器等待命令步骤,等待执行下一个SCSI命令;(3)所述接收超时处理步骤,负责向启动器报告当前vSCSI命令执行出错,并终止当前vSCSI命令的执行,转入目标器等待命令步骤,等待执行下一个vSCSI命令;(4)所述非法处理步骤,负责对该非法登入的拒绝,并转入目标器等待登入步骤,继续等待下一个启动器登入。The described protocol for network storage is characterized in that: (1) the authentication timeout processing step is responsible for returning an error status to the system or upper-layer application, and terminating the initiator process; (2) the response timeout processing Step, is responsible for reporting current SCSI command to system or upper stratum application and carries out mistake, and terminates the execution of current SCSI command, changes over to starter to wait for command step, waits to execute next SCSI command; (3) described receiving overtime processing step, is responsible for sending to The initiator reports that the execution of the current vSCSI command is wrong, and terminates the execution of the current vSCSI command, and turns to the target device to wait for the command step and wait for the execution of the next vSCSI command; (4) the illegal processing step is responsible for the rejection of the illegal login, and Go to the target waiting for login step, and continue to wait for the next initiator to login.
用于本发明所述的网络存储协议的一种系统,包括一个启动器和N个目标器,启动器通过VI网络接口、VI连接通道连入网络;目标器通过VI网络接口、VI连接通道连入网络;启动器和目标器间通过一个或多个VI连接通道传输数据,N为自然数;A system for the network storage protocol of the present invention includes an initiator and N targets, the initiator is connected to the network through a VI network interface and a VI connection channel; the target device is connected to the network through a VI network interface and a VI connection channel. into the network; data is transmitted between the initiator and the target through one or more VI connection channels, and N is a natural number;
(1)启动器中包括:SCSI适配器驱动模块、vSCSI API模块、连接管理模块、命令处理模块和数据加解密模块;(1) The initiator includes: SCSI adapter driver module, vSCSI API module, connection management module, command processing module and data encryption and decryption module;
(1.1)SCSI适配器驱动模块向操作系统注册一个或多个虚拟SCSI磁盘设备,接收来自操作系统SCSI中间层的标准SCSI命令并返回执行状态,向用户空间提供传统磁盘设备并接受访问;(1.1) The SCSI adapter driver module registers one or more virtual SCSI disk devices with the operating system, receives standard SCSI commands from the SCSI middle layer of the operating system and returns execution status, and provides traditional disk devices to user space and accepts access;
(1.2)vSCSIAPI模块向上层开发人员提供直接调用接口,绕过操作系统SCSI分层体系直接接收上层命令并返回执行状态;(1.2) The vSCSIAPI module provides a direct call interface to upper-layer developers, bypassing the operating system SCSI layered system to directly receive upper-layer commands and return to the execution status;
(1.3)启动器中的连接管理模块向一个或多个目标器发起一个或多个VI连接认证请求,完成建立连接、参数协商、内存注册等初始化工作和友好登出工作,在VI连接意外断开的情况下重新建立连接,并将服务转移到新的VI连接;(1.3) The connection management module in the initiator initiates one or more VI connection authentication requests to one or more targets, completes initialization work such as connection establishment, parameter negotiation, memory registration, and friendly logout work. Re-establish the connection and transfer the service to the new VI connection;
(1.4)命令处理模块负责将从SCSI适配器驱动模块或vSCSI API模块接收的命令转换成vSCSI命令,调用PostSend执行VI发送缓冲区队列中数据的投递工作,调用PostRecv执行VI接收缓冲区的准备工作,发送及接收vSCSI命令,并生成相应命令或状态,返回上层SCSI适配器驱动模块或vSCSI API模块;(1.4) The command processing module is responsible for converting the command received from the SCSI adapter driver module or vSCSI API module into a vSCSI command, calling PostSend to execute the delivery of data in the VI sending buffer queue, calling PostRecv to execute the preparation of the VI receiving buffer, Send and receive vSCSI commands, generate corresponding commands or status, and return to the upper SCSI adapter driver module or vSCSI API module;
(1.5)数据加解密模块负责对待传输数据的加密处理和对已接收数据的解密处理;写请求时,根据加解密级别对数据进行加密运算后从系统缓冲区拷贝到VI注册缓冲区;读请求时,对数据进行解密运算后从VI注册缓冲区拷贝到系统缓冲区;(1.5) The data encryption and decryption module is responsible for the encryption processing of the data to be transmitted and the decryption processing of the received data; when writing a request, the data is encrypted according to the encryption and decryption level and then copied from the system buffer to the VI registration buffer; the read request , copy the data from the VI registration buffer to the system buffer after decrypting the data;
(2)目标器中包括:设备管理模块、连接管理模块、请求处理模块和读写增速模块;(2) The target device includes: a device management module, a connection management module, a request processing module and a read-write speed-up module;
(2.1)设备管理模块负责建立设备服务线程,收集物理存储设备信息,完成本地设备服务状态与控制数据的初始化,与设备进行实时信息交互;(2.1) The device management module is responsible for establishing device service threads, collecting physical storage device information, completing the initialization of local device service status and control data, and interacting with devices in real time;
(2.2)目标器中的连接管理模块监听校验来自启动器节点的连接认证请求,与合法启动器建立起一个或多个VI连接,完成参数协商、内存注册等初始化工作;(2.2) The connection management module in the target monitors and verifies the connection authentication request from the initiator node, establishes one or more VI connections with the legal initiator, and completes initialization work such as parameter negotiation and memory registration;
(2.3)请求处理模块负责接收并处理启动器发送的vSCSI请求,包括读写请求、设备状态查询请求等,根据到达请求的种类启动相应处理进程,其中读写请求与物理存储设备的数据交互将由读写增速模块进行控制;(2.3) The request processing module is responsible for receiving and processing vSCSI requests sent by the initiator, including read and write requests, device status query requests, etc., and starts the corresponding processing process according to the type of arrival request, wherein the data interaction between the read and write requests and the physical storage device will be determined by Read and write speed-up module for control;
(2.4)读写增速模块控制读写请求处理过程中VI注册缓冲区中数据是读自或写到Cache还是物理存储设备,以及处理Cache中数据与物理存储设备中数据的一致性问题。(2.4) The read-write speed-up module controls whether the data in the VI registration buffer is read from or written to the Cache or the physical storage device during the processing of the read-write request, and handles the consistency between the data in the Cache and the data in the physical storage device.
所述的系统,其特征在于,所述运行vSCSI的启动器和目标器均配备了支持VI传输的网络适配器;所述VI网络适配器至少提供可靠发送或可靠接收两种可靠性级别中的一种;所述启动器和目标器均能运行于同一或分布于不同计算机主机节点之中;所述启动器所在节点中虚拟SCSI设备与操作系统间,以及vSCSI API均采用SCSI接口;所述目标器所在节点中的物理存储设备为采用SCSI、FC、IDE或SATA接口的磁盘驱动器或磁盘阵列;所述网络为以太介质的局域网。The system is characterized in that, the initiator and the target running vSCSI are equipped with a network adapter supporting VI transmission; the VI network adapter provides at least one of two reliability levels of reliable sending or reliable receiving ; Both the initiator and the target device can run in the same or distributed among different computer host nodes; the virtual SCSI device and the operating system in the node where the initiator is located, and the vSCSI API all adopt the SCSI interface; the target device The physical storage device in the node is a disk drive or disk array with SCSI, FC, IDE or SATA interface; the network is a local area network of Ethernet medium.
上文中,本发明的网络存储协议vSCSI,全称是VI-attached SCSI,其中VI指虚拟接口,SCSI指小型计算机系统接口;vSCSI向上层应用提供的应用程序编程接口vSCSI API;虚拟接口(VI)向上层应用提供的可调用函数PostRecv,实现准备VI接收缓冲区的功能;虚拟接口(VI)向上层应用提供的可调用函数PostSend,实现投递VI发送缓冲区中数据的功能;Cache是一片内存缓冲区,用于缓存物理存储设备上的数据;request_buffer是SCSI命令结构中用于存放待读写数据的区域。Above, the network storage protocol vSCSI of the present invention, full name is VI-attached SCSI, wherein VI refers to virtual interface, and SCSI refers to small computer system interface; The callable function PostRecv provided by the layer application realizes the function of preparing the VI receiving buffer; the callable function PostSend provided by the virtual interface (VI) to the upper layer application realizes the function of delivering the data in the VI send buffer; Cache is a memory buffer , used to cache data on the physical storage device; request_buffer is an area in the SCSI command structure used to store data to be read and written.
本发明利用VI取代传统TCP/IP协议作为构建存储协议的通信基础,缩短了网络存储数据的关键路径,提高了物理网络带宽的利用率和网络I/O的响应速度,有效的解决了网络带宽、存储访问速度、互操作性等主要的网络存储问题。The present invention uses VI to replace the traditional TCP/IP protocol as the communication basis for constructing the storage protocol, shortens the critical path of network storage data, improves the utilization rate of physical network bandwidth and the response speed of network I/O, and effectively solves the problem of network bandwidth , storage access speed, interoperability and other major network storage issues.
具体而言,本发明具有以下优点:Specifically, the present invention has the following advantages:
1.利用VI高带宽、低延迟特性,获得比传统基于TCP/IP的存储协议更高的传输性能;实测数据表明,同等条件下vSCSI比iSCSI(intel实现版本)顺序写性能提高达115%,顺序读性能提高达35%。1. Utilize VI's high bandwidth and low latency features to obtain higher transmission performance than traditional TCP/IP-based storage protocols; measured data shows that vSCSI has 115% higher sequential write performance than iSCSI (intel version) under the same conditions, Up to 35% improvement in sequential read performance.
2.利用VI多级别的可靠性传输,将传输可靠性保证这一重要功能下移到VI内部实现,从而大大避免了上层存储协议在实现存储可靠性时的复杂性,减少存储协议开销,提高存储协议效率。2. Using the multi-level reliability transmission of VI, the important function of guaranteeing transmission reliability is moved down to the internal implementation of VI, thus greatly avoiding the complexity of upper-layer storage protocols in realizing storage reliability, reducing storage protocol overhead, and improving storage protocol efficiency.
3.利用VI在数据传输行为上的表现特征,实现数据存储过程中无损传输性能的数据加密;反之亦可实现无损传输性能的数据解密。即数据包投递(执行PostSend操作)给VI所需耗费的时间与VI连续传输数据量成阶跃函数,当连续量超过第二个临界值后投递时间将稳定在一个相对较高值,利用上一个数据包的投递时间对下一个数据包进行加密即可;反之亦可同理实现解密。3. Use the performance characteristics of VI in data transmission behavior to realize data encryption without loss of transmission performance during data storage; conversely, data decryption without loss of transmission performance can also be realized. That is, the time required to deliver data packets (PostSend operation) to VI is a step function with the amount of data continuously transmitted by VI. When the continuous amount exceeds the second critical value, the delivery time will stabilize at a relatively high value. The delivery time of a data packet can encrypt the next data packet; vice versa can also be decrypted in the same way.
4.协议内部支持启动器与多个目标器建立多个VI连接,这些VI连接具有高度并行性,除了可以实现并行存取操作外,还可以实现数据的并行网络传输,保证了存储协议的高效率和高性能。4. The protocol supports the establishment of multiple VI connections between the initiator and multiple targets. These VI connections are highly parallel. In addition to parallel access operations, parallel network transmission of data can also be achieved, ensuring a high level of storage protocol. efficiency and high performance.
5.协议内部支持多目标器环境下多种数据分布规则(rules)或策略(policy),实现诸如目标器间负载平衡、数据备份等功能。5. The protocol internally supports multiple data distribution rules (rules) or policies (policy) in a multi-target environment to realize functions such as load balancing between targets and data backup.
6.本发明能够灵活地针对不同类型地应用环境和要求,提供多种存储服务:标准SCSI设备服务、API调用服务。6. The present invention can flexibly provide multiple storage services for different types of application environments and requirements: standard SCSI device services, API call services.
附图说明Description of drawings
图1为本发明的系统组成结构示意图;Fig. 1 is a schematic diagram of the composition and structure of the system of the present invention;
图2为本发明启动器和目标器中的模块及其关系示意图;Fig. 2 is a schematic diagram of modules and their relationships in the initiator and target device of the present invention;
图3为本发明协议的示意图。Figure 3 is a schematic diagram of the protocol of the present invention.
具体实施方式Detailed ways
图1为本发明的系统组成结构示意图。本发明包括一个启动器100和n个目标器200.1、200.2、…、200.n,它们通过VI通信网络300互连,n为自然数。启动器100通过VI网络接口121.1~121.m(m为自然数)连入网络300,目标器200.1、200.2、…、200.n分别通过各自的VI网络接口221.1、221.2、…、221.n连入网络300。启动器100通过m个VI网络接口121.1~121.m建立起k(k为自然数)个VI连接通道120.1~120.k,目标器200.1、200.2、…、200.n分别通过各自的VI网络接口221.1、221.2、…、221.n建立总共k个VI连接通道220.1~220.k;其中,启动器100的VI连接通道120.1~120.k1与目标器200.1的VI连接通道220.1~220.k1建立起k1个VI连接,负责启动器100与目标器200.1之间的请求响应和数据传输;启动器100的VI连接通道120.k1+1~120.k2与目标器200.2的VI连接通道220.k1+1~220.k2建立起k2-k1个VI连接,负责启动器100与目标器200.2之间的请求响应和数据传输;以此类推,启动器100的VI连接通道120.kn-1+1~120.k与目标器200.2的VI连接通道220.kn-1+1~220.k建立起k-kn-1个VI连接,负责启动器100与目标器200.2之间的请求响应和数据传输;启动器100中k的大小不受限于m的大小,但k≥m,使用m个VI网络接口可以增加启动器接入网络的物理带宽,但不影响逻辑实现,目标器亦同理。Fig. 1 is a schematic diagram of the composition and structure of the system of the present invention. The present invention includes an
图2表示了本发明启动器和目标器中的模块以及模块间的关系图。SCSI适配器驱动模块150、vSCSI API模块160、连接管理模块110、命令处理模块130和数据加解密模块140运行于启动器100中。SCSI适配器驱动模块150对内与命令处理模块130交互,对外与启动器所在节点主机操作系统(SCSI中间层)交互;vSCSIAPI模块160对内与命令处理模块130交互,对外与启动器所在节点主机中运行的第三方程序交互;SCSI适配器驱动模块150和vSCSI API模块160是两个并行功能模块,为外界使用本发明提供程序接口,一般情况下对同一启动器在同一时刻只使用其中一个模块与外界交互。Fig. 2 shows the modules in the initiator and the target device of the present invention and the relation diagram among the modules. SCSI
设备管理模块250、连接管理模块210、请求处理模块230和读写增速模块240运行于目标器200中。设备管理模块250对内与请求处理模块230交互,对外与目标器所在节点主机中挂载的物理存储设备交互;读写增速模块240对内与请求处理模块230以及连接管理模块210交互,对外与目标器所在节点主机中挂载的物理存储设备交互;设备管理模块250与物理存储设备主要进行状态查询等控制信息的交互,而读写增速模块240与物理存储设备主要进行数据存取操作。The
图3为本发明的协议的示意图,主要表示安全登入和三大类命令(查询、读、写)的处理过程。Fig. 3 is a schematic diagram of the protocol of the present invention, which mainly shows the process of safe login and three types of commands (query, read, write).
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2006101665830A CN1997033B (en) | 2006-12-28 | 2006-12-28 | A method and system for network storage |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2006101665830A CN1997033B (en) | 2006-12-28 | 2006-12-28 | A method and system for network storage |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1997033A CN1997033A (en) | 2007-07-11 |
CN1997033B true CN1997033B (en) | 2010-11-24 |
Family
ID=38251954
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2006101665830A Expired - Fee Related CN1997033B (en) | 2006-12-28 | 2006-12-28 | A method and system for network storage |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1997033B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101335765B (en) * | 2008-07-25 | 2010-12-29 | 华中科技大学 | Storage service middleware based on mobile caching |
CN101729589B (en) * | 2008-10-14 | 2012-07-18 | 北京大学 | Method and system for improving end-to-end data transmission rate |
WO2011128936A1 (en) * | 2010-04-14 | 2011-10-20 | 株式会社日立製作所 | Storage control device and control method of storage control device |
CN102882697B (en) * | 2011-07-13 | 2015-08-26 | 北京佳讯飞鸿电气股份有限公司 | A kind of message receival method of the network management system multi-client based on callback mechanism |
CN102843435A (en) * | 2012-09-10 | 2012-12-26 | 浪潮(北京)电子信息产业有限公司 | Access and response method and access and response system of storing medium in cluster system |
CN103176751A (en) * | 2013-03-04 | 2013-06-26 | 浪潮电子信息产业股份有限公司 | Unified service system under multiple storage protocols |
US10353631B2 (en) | 2013-07-23 | 2019-07-16 | Intel Corporation | Techniques for moving data between a network input/output device and a storage device |
CN114666420B (en) * | 2022-03-29 | 2023-11-14 | 浙江大学 | An open multi-communication protocol component |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040003141A1 (en) * | 2002-05-06 | 2004-01-01 | Todd Matters | System and method for implementing virtual adapters and virtual interfaces in a network system |
WO2005060186A1 (en) * | 2003-12-17 | 2005-06-30 | Nec Corporation | Network, router device, route updating suppression method used for the same, and program thereof |
CN1761222A (en) * | 2005-11-22 | 2006-04-19 | 华中科技大学 | Storage network adapter of supporting virtual interface |
CN1761257A (en) * | 2005-11-22 | 2006-04-19 | 华中科技大学 | Memory system based on virtual interface |
-
2006
- 2006-12-28 CN CN2006101665830A patent/CN1997033B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040003141A1 (en) * | 2002-05-06 | 2004-01-01 | Todd Matters | System and method for implementing virtual adapters and virtual interfaces in a network system |
WO2005060186A1 (en) * | 2003-12-17 | 2005-06-30 | Nec Corporation | Network, router device, route updating suppression method used for the same, and program thereof |
CN1761222A (en) * | 2005-11-22 | 2006-04-19 | 华中科技大学 | Storage network adapter of supporting virtual interface |
CN1761257A (en) * | 2005-11-22 | 2006-04-19 | 华中科技大学 | Memory system based on virtual interface |
Non-Patent Citations (1)
Title |
---|
US 20040003141 A1,全文. |
Also Published As
Publication number | Publication date |
---|---|
CN1997033A (en) | 2007-07-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1997033B (en) | A method and system for network storage | |
CN105549904B (en) | A kind of data migration method and storage equipment applied in storage system | |
US10095419B2 (en) | Distributed file serving architecture system with metadata storage virtualization and data access at the data server connection speed | |
JP2021527286A (en) | Encryption for distributed file systems | |
US7808996B2 (en) | Packet forwarding apparatus and method for virtualization switch | |
US8719923B1 (en) | Method and system for managing security operations of a storage server using an authenticated storage module | |
CN108268208A (en) | A kind of distributed memory file system based on RDMA | |
WO2007141206A2 (en) | System, method and computer program product for secure access control to a storage device | |
WO2011009406A1 (en) | System and method for data processing | |
US12126548B2 (en) | Supporting communications for data storage | |
US10599356B2 (en) | Aggregating memory to create a network addressable storage volume for storing virtual machine files | |
CN101420360A (en) | A kind of stage network memory access method | |
Kim et al. | Optimizing end-to-end big data transfers over terabits network infrastructure | |
CN100409673C (en) | High performance distributed parallel storage system based on embedded IP storage technology | |
JP4948938B2 (en) | Method and apparatus for authorizing cross-partition commands | |
Liu et al. | Evaluating the Impact of RDMA on Storage I/O over Infiniband | |
CN101655773A (en) | Disk array miniature computer system interface target device and data transmission method | |
CN107613026A (en) | Distributed file management system based on cloud storage system | |
CN108509155B (en) | Method and device for remotely accessing disk | |
CN117971766A (en) | Data transmission method and system for single network card multiple GPUs based on GPUDrirect RDMA technology | |
CN102868684A (en) | Fiber channel target and realizing method thereof | |
Yang et al. | uNVMe-TCP: a user space approach to optimizing NVMe over fabrics TCP transport | |
CN114661239A (en) | Data interaction system and method based on NVME hard disk | |
Wang et al. | The meteorological cloud desktop system of cma meteorological observation center | |
KR100676674B1 (en) | Data input / output acceleration device and its operation method for high speed data input / output |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20101124 Termination date: 20201228 |
|
CF01 | Termination of patent right due to non-payment of annual fee |