CN105787093B

CN105787093B - A Construction Method of Log File System Based on LSM-Tree Structure

Info

Publication number: CN105787093B
Application number: CN201610152908.3A
Authority: CN
Inventors: 陈康; 武永卫; 郑纬民; 王振钊
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2016-03-17
Filing date: 2016-03-17
Publication date: 2019-07-02
Anticipated expiration: 2036-03-17
Also published as: CN105787093A

Abstract

The invention proposes a kind of construction methods of log file system based on LSM-Tree structure, include the following steps to construct the log file system fuse framework interface based on LSM-Tree structure, includes the following steps: the multiple directory operation functions for constructing the log file system based on LSM-Tree structure and file manipulation function；Record data are added and inquired into the log file system based on LSM-Tree structure of building using Hash mapping function.The present invention can effectively improve the readwrite performance of catalogue, small documents under the premise of keeping big file read-write performance constant.

Description

A Construction Method of Log File System Based on LSM-Tree Structure

技术领域technical field

本发明涉及文件系统技术领域，特别涉及一种基于LSM-Tree结构的日志文件系统的构建方法。The invention relates to the technical field of file systems, in particular to a construction method of a log file system based on an LSM-Tree structure.

背景技术Background technique

文件系统是操作系统在计算机的磁盘上存储和管理数据的机制。1964年由贝尔实验室、麻省理工学院及北美通用电气公司共同开发研制的Multics分时操作系统，首次提出了目录树结构思想，标志着现代文件系统的起源。A file system is the mechanism by which the operating system stores and manages data on a computer's disk. In 1964, the Multics time-sharing operating system jointly developed by Bell Labs, MIT and North American General Electric Company first proposed the idea of directory tree structure, marking the origin of the modern file system.

UNIX操作系统将这一树形思想用于自己的文件系统设计中，形成了包含引导块、超级块、索引节点和数据块四个模块的文件系统架构。此后，许许多多文件系统都沿用了这一组织架构。The UNIX operating system uses this tree idea in its own file system design, forming a file system architecture including four modules: boot block, super block, inode and data block. Since then, many file systems have followed this organizational structure.

为了不断提高I/O性能，1984年出现了快速文件系统(Fast File System，简称FFS)。它引入了柱面组(Cylinder Group)的概念，尽可能将同一目录下的若干文件保存在同一组中，将同一个文件的若干数据块保存在同一个组中，这样能够显著减少总寻道时间，提升读写性能。继FFS之后，1989年出现了日志结构文件系统(Log-structured FileSystem)，它采用日志追加的思想，以记录日志的形式进行文件的写入，并在日志结构上做索引用于文件的读取，大大提升了写性能，使用这种结构能够实现文件系统崩溃后的快速恢复。In order to continuously improve the I/O performance, the Fast File System (FFS) appeared in 1984. It introduces the concept of Cylinder Group, saves several files in the same directory in the same group as much as possible, and saves several data blocks of the same file in the same group, which can significantly reduce the total seek time to improve read and write performance. After FFS, the Log-structured File System appeared in 1989. It adopts the idea of log appending, writes files in the form of log records, and indexes the log structure for file reading. , greatly improving the write performance, and using this structure can achieve fast recovery after a file system crash.

1994年，随着Linux1.0内核的诞生，扩展文件系统(The Extended File System,简称ext)系列开始进入人们视线，ext1是第一个Linux虚拟文件系统(Linux Virtual FileSystem，简称Linux VFS)，可管理的最大磁盘空间为2GB，此时ext1各方面还略显简陋。ext2的出现，逐渐开始流行，它具有可管理最大磁盘空间16TB、最大文件大小2TB、最长文件名255字节等诸多优点，但是其在日志管理方面有明显的缺陷，不适合于对安全性要求高的系统。2001年，ext3应运而生，它在ext2基础上加入了健全的日志功能，解决了ext2的致命弱点，使用ext3能够极大提高文件系统数据的可靠性，即使发生非正常宕机，在开机后只需要10秒钟即可恢复数据。为了支持更大的文件、管理更大的磁盘空间、做出若干优化，产生了ext4文件系统，它使用延迟分配、多块分配、无日志模式等来提高性能，并支持无限数量的子目录、在线碎片整理、持久预分配等功能，是对ext3的进一步优化。In 1994, with the birth of the Linux 1.0 kernel, the Extended File System (ext) series began to enter people's attention. ext1 is the first Linux Virtual File System (Linux Virtual File System, referred to as Linux VFS). The maximum disk space managed is 2GB, at this time ext1 is still a little rudimentary in all aspects. The emergence of ext2 has gradually become popular. It has many advantages such as the maximum disk space of 16TB, the maximum file size of 2TB, and the longest file name of 255 bytes. However, it has obvious defects in log management and is not suitable for security. demanding system. In 2001, ext3 came into being. It added a sound log function on the basis of ext2, which solved the fatal weakness of ext2. Using ext3 can greatly improve the reliability of file system data. It only takes 10 seconds to restore data. In order to support larger files, manage larger disk space, and make several optimizations, the ext4 file system was created, which uses delayed allocation, multi-block allocation, no-journal mode, etc. to improve performance, and supports an unlimited number of subdirectories, Online defragmentation, persistent pre-allocation and other functions are further optimizations to ext3.

ext3是当前最为流行的Linux文件系统，本文挑选ext3作为传统文件系统的代表，将ext3作为研究对象，分析这一类文件系统架构的共同特点。Ext3 is the most popular Linux file system at present. This paper selects ext3 as the representative of traditional file system, takes ext3 as the research object, and analyzes the common characteristics of this type of file system architecture.

在ext3等传统文件系统为用户提供高可用性、高存取速度、多日志模式支持等诸多优点的同时，此类传统文件系统架构也有它不足的地方，尤其是对于小文件存储而言，具体来说体现在如下三个方面的缺陷：数据分布随机化、空间浪费和索引节点资源有限。While traditional file systems such as ext3 provide users with many advantages such as high availability, high access speed, and multi-log mode support, such traditional file system architectures also have their shortcomings, especially for small file storage. It is said that the defects are reflected in the following three aspects: randomization of data distribution, waste of space and limited resources of index nodes.

发明内容SUMMARY OF THE INVENTION

本发明的目的旨在至少解决所述技术缺陷之一。The purpose of the present invention is to solve at least one of the technical defects.

为此，本发明的目的在于提出一种基于LSM-Tree结构的日志文件系统的构建方法。Therefore, the purpose of the present invention is to propose a method for constructing a log file system based on the LSM-Tree structure.

为了实现上述目的，本发明的实施例提供一种基于LSM-Tree结构的日志文件系统的构建方法，包括如下步骤：In order to achieve the above object, an embodiment of the present invention provides a method for constructing a log file system based on an LSM-Tree structure, comprising the following steps:

步骤S1，构建基于LSM-Tree结构的日志文件系统fuse框架接口，包括如下步骤：Step S1, building a log file system fuse frame interface based on the LSM-Tree structure, including the following steps:

步骤S11，调用fuse_main()函数将fuse文件系统挂载到挂载点上，创建UNIX本地套接字，创建并运行子进程fusermount，然后调用fuse_new()函数为fuse文件系统分配数据存储空间，完成挂载；Step S11, call the fuse_main() function to mount the fuse file system to the mount point, create a UNIX local socket, create and run the sub-process fusermount, and then call the fuse_new() function to allocate data storage space for the fuse file system, complete mount;

步骤S12，完成挂载后，fuse_main()函数调用fuse_loop()开启会话模式，向用户提供会话服务；Step S12, after the mounting is completed, the fuse_main() function calls fuse_loop() to start the session mode, and provides session services to the user;

步骤S13，采用fusermount-uPATH命令将fuse文件系统卸载，则中断所述会话服务，回收对应的存储空间；Step S13, using the fusermount-uPATH command to unload the fuse file system, then interrupt the session service and reclaim the corresponding storage space;

步骤S2，构建基于LSM-Tree结构的日志文件系统的多个目录操作函数和文件操作函数；Step S2, construct multiple directory operation functions and file operation functions of the log file system based on the LSM-Tree structure;

步骤S3，采用哈希映射函数向构建的基于LSM-Tree结构的日志文件系统中添加和查询记录数据。Step S3, using a hash mapping function to add and query record data into the constructed log file system based on the LSM-Tree structure.

根据本发明实施例的基于LSM-Tree结构的日志文件系统的构建方法，构造的LevelFS文件系统可以在保持大文件读写性能不变的前提下，能够有效地提高目录、小文件的读写性能。According to the construction method of the log file system based on the LSM-Tree structure according to the embodiment of the present invention, the constructed LevelFS file system can effectively improve the read and write performance of directories and small files on the premise of keeping the read and write performance of large files unchanged. .

进一步，所述多个目录操作函数包括：目录创建函数fs_mkdir、目录存放列出函数fs_readdi、目录删除函数fs_rmdir；Further, the multiple directory operation functions include: a directory creation function fs_mkdir, a directory storage listing function fs_readdi, and a directory deletion function fs_rmdir;

所述多个文件操作函数包括：文件重命名函数fs_rename、文件打开函数fs_open、文件读取函数fs_read、文件写入函数fs_write、文件大小设置函数fs_truncate、文件权限修改函数fs_chmod、文件账户信息修改函数fs_chown、文件系统信息读取函数fs_statvfs、文件时间戳更新函数fs_utimens、指向target符号链接的文件创建函数fs_symlink、inumber路径获取函数get_disk_path和磁盘文件打开函数open_disk_file。The multiple file operation functions include: file renaming function fs_rename, file opening function fs_open, file reading function fs_read, file writing function fs_write, file size setting function fs_truncate, file permission modification function fs_chmod, file account information modification function fs_chown , the file system information read function fs_statvfs, the file timestamp update function fs_utimens, the file creation function fs_symlink pointing to the target symbolic link, the inumber path acquisition function get_disk_path and the disk file open function open_disk_file.

进一步，在所述步骤S3中，所述采用哈希映射函数向构建的基于LSM-Tree结构的日志文件系统中添加记录数据，包括如下步骤：Further, in described step S3, described adopting the hash mapping function to build the log file system based on LSM-Tree structure to add record data, including the following steps:

设使用k个哈希映射函数，分别将键映射到[0，m-1]之间的k个数，当需要写入一条记录的时候，通过映射找到对应的k个数，然后将字节数组中这k个对应位置中的数都加1，表明系统中存在这样一条记录。Suppose k hash mapping functions are used to map the keys to k numbers between [0, m-1] respectively. When a record needs to be written, the corresponding k numbers are found through the mapping, and then the bytes are The numbers in the k corresponding positions in the array are all incremented by 1, indicating that there is such a record in the system.

进一步，在所述步骤S3中，所述采用哈希映射函数向构建的基于LSM-Tree结构的日志文件系统中查询记录数据，包括如下步骤：Further, in the step S3, the use of the hash mapping function to query record data in the log file system based on the LSM-Tree structure that is constructed, includes the following steps:

通过哈希映射函数找到记录数据的对应位置，判断各个位置上的值是否都大于0，如果是，则读取该记录数据。Find the corresponding position of the record data through the hash mapping function, and judge whether the value at each position is greater than 0. If so, read the record data.

进一步，在所述步骤S3之后，还包括如下步骤：删除记录数据，则将数组中该条记录对应位置上的数都减1。Further, after the step S3, the following steps are further included: delete the record data, then decrement the number at the corresponding position of the record in the array by 1.

本发明附加的方面和优点将在下面的描述中部分给出，部分将从下面的描述中变得明显，或通过本发明的实践了解到。Additional aspects and advantages of the present invention will be set forth, in part, from the following description, and in part will be apparent from the following description, or may be learned by practice of the invention.

附图说明Description of drawings

本发明的上述和/或附加的方面和优点从结合下面附图对实施例的描述中将变得明显和容易理解，其中：The above and/or additional aspects and advantages of the present invention will become apparent and readily understood from the following description of embodiments taken in conjunction with the accompanying drawings, wherein:

图1为根据本发明实施例的基于LSM-Tree结构的日志文件系统的构建方法的流程图；1 is a flowchart of a method for constructing a log file system based on an LSM-Tree structure according to an embodiment of the present invention;

图2为根据本发明实施例的构建基于LSM-Tree结构的日志文件系统fuse框架接口的流程图；Fig. 2 is the flow chart of constructing the log file system fuse frame interface based on LSM-Tree structure according to an embodiment of the present invention;

图3为根据本发明实施例所述基于LSM-Tree结构的日志文件系统的fuse框架接口的示意图；3 is a schematic diagram of a fuse frame interface of a log file system based on an LSM-Tree structure according to an embodiment of the present invention;

图4为根据本发明实施例所述基于LSM-Tree结构的日志文件系统的mkdir命令的数据流程图；4 is a data flow diagram of the mkdir command of the log file system based on the LSM-Tree structure according to an embodiment of the present invention;

图5为根据本发明实施例所述添加记录x、y数据结构图；5 is a data structure diagram of adding records x and y according to an embodiment of the present invention;

图6为根据本发明实施例所述查询p、q操作流程图；6 is a flowchart of an operation of querying p and q according to an embodiment of the present invention;

图7为根据本发明实施例所述删除记录数据结构图。FIG. 7 is a data structure diagram of a deletion record according to an embodiment of the present invention.

具体实施方式Detailed ways

下面详细描述本发明的实施例，所述实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，旨在用于解释本发明，而不能理解为对本发明的限制。The following describes in detail the embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary, and are intended to explain the present invention and should not be construed as limiting the present invention.

本发明针对传统的文件系统对于小文件的存取效率低下的不足，提出一种基于LSM-Tree(Log-Structured-Merge Tree)结构的日志文件系统的构建方法，构建基于LSM-Tree结构的日志文件系统，实现了基于上述基本架构的使用日志来构造的文件系统LevelFS，从而可以为小文件设计存取优化的技术方案，提高对大量的小文件存取操作的效率。Aiming at the problem that the traditional file system has low access efficiency for small files, the invention proposes a construction method of a log file system based on an LSM-Tree (Log-Structured-Merge Tree) structure, and constructs a log based on the LSM-Tree structure. The file system implements LevelFS, a file system constructed using logs based on the above-mentioned basic architecture, so that an access optimization technical scheme can be designed for small files, and the efficiency of access operations to a large number of small files can be improved.

本发明提出的基于LSM-Tree结构的日志文件系统的构建方法，包括以下步骤：通过在内存中设置写缓存，将若干小文件的磁盘随机写变成磁盘顺序写以提高写效率，同时在磁盘上减少关联数据的存储距离以提高读效率，从而提高文件系统的读写性能。The construction method of the log file system based on the LSM-Tree structure proposed by the present invention includes the following steps: by setting a write cache in the memory, the disk random writes of several small files are changed into disk sequential writes to improve the writing efficiency, and at the same time, the disk On the other hand, the storage distance of the associated data is reduced to improve the read efficiency, thereby improving the read and write performance of the file system.

如图1所示，本发明实施例的基于LSM-Tree结构的日志文件系统的构建方法，包括如下步骤：As shown in Figure 1, the construction method of the log file system based on the LSM-Tree structure of the embodiment of the present invention comprises the following steps:

参考图2和图3所示，fuse框架接口分为三个部分：挂载、回话和卸载。Referring to Figure 2 and Figure 3, the fuse framework interface is divided into three parts: mount, call and unload.

步骤S11，调用fuse_main()函数将fuse文件系统挂载到挂载点上，创建UNIX本地套接字，创建并运行子进程fusermount，然后调用fuse_new()函数为fuse文件系统分配数据存储空间，完成挂载。Step S11, call the fuse_main() function to mount the fuse file system to the mount point, create a UNIX local socket, create and run the sub-process fusermount, and then call the fuse_new() function to allocate data storage space for the fuse file system, complete mount.

挂载部分：用户运行编译链接后的可执行文件，会调用fuse_main()函数解析挂载点、多线程支持等参数，标志着FUSE文件系统生命周期的开始。fuse_main()函数会调用fuse_mount()函数将fuse文件系统挂载到挂载点上，创建UNIX本地套接字，创建并运行子进程fusermount，然后调用fuse_new()函数为fuse文件系统分配数据存储空间fuse_datastructure，挂载完毕。Mounting part: When the user runs the compiled and linked executable file, the fuse_main() function will be called to parse the mount point, multi-threading support and other parameters, marking the beginning of the FUSE file system life cycle. The fuse_main() function will call the fuse_mount() function to mount the fuse file system on the mount point, create a UNIX local socket, create and run the subprocess fusermount, and then call the fuse_new() function to allocate data storage space for the fuse file system fuse_datastructure, the mount is complete.

步骤S12，完成挂载后，fuse_main()函数调用fuse_loop()开启会话模式，向用户提供会话服务。Step S12, after the mounting is completed, the fuse_main() function calls fuse_loop() to start the session mode, and provides session services to the user.

会话部分:fuse_main()函数在挂载完成后调用fuse_loop()开启会话模式，为用户提供不断的接收回话(receive session)、处理会话(process session)、返回会话的服务。Session part: The fuse_main() function calls fuse_loop() to open the session mode after the mount is completed, providing users with the services of continuously receiving the session, processing the session, and returning the session.

步骤S13，采用fusermount-uPATH命令将fuse文件系统卸载，则中断回话服务，回收对应的存储空间。In step S13, the fuse file system is unmounted by using the fusermount-uPATH command, then the call-back service is interrupted, and the corresponding storage space is reclaimed.

卸载部分：用户使用fusermount–u PATH命令将fuse文件系统卸载，则会话中断，存储空间回收，标志着FUSE文件系统生命周期的结束。Unmounting part: The user unmounts the fuse file system using the fusermount-u PATH command, the session is interrupted and the storage space is reclaimed, marking the end of the FUSE file system life cycle.

步骤S2，构建基于LSM-Tree结构的日志文件系统的多个目录操作函数和文件操作函数。In step S2, multiple directory operation functions and file operation functions of the log file system based on the LSM-Tree structure are constructed.

图4是本发明实施例所述基于LSM-Tree结构的日志文件系统的mkdir命令的数据流程图，如图4所示，以linux下的mkdir命令为例，在“/”下创建目录“hi”，通过层层解析后传递给LevelFS的int fs_mkdir(const char*path,mode_t mode)函数。下面具体来介绍一下目录、文件操作相关函数的功能和要点。Fig. 4 is the data flow chart of the mkdir command of the log file system based on the LSM-Tree structure according to the embodiment of the present invention. As shown in Fig. 4, taking the mkdir command under linux as an example, a directory "hi" is created under "/" ”, which is passed to the int fs_mkdir(const char*path, mode_t mode) function of LevelFS after being parsed layer by layer. The following is a detailed introduction to the functions and key points of the functions related to directory and file operations.

在本发明的一个实施例中，多个目录操作函数包括：目录创建函数fs_mkdir、目录存放列出函数fs_readdi、目录删除函数fs_rmdir。In an embodiment of the present invention, the plurality of directory operation functions include: a directory creation function fs_mkdir, a directory storage listing function fs_readdi, and a directory deletion function fs_rmdir.

(1)fs_mkdir(1) fs_mkdir

功能：在指定路径上创建目录。Function: Create a directory on the specified path.

要点：首先，将path解析为父目录par_path和子目录文件dir_name，根据par_path找到父节点的编号par_inumber，同时取出的还有父节点的meta信息，结合账户信息判断用户是否具有par_path目录下创建文件的权限，如果没有则返回错误。如果有权限，则为该目录文件分配i_number＝cur_inumber+1，创建fs_inode数据结构作为new_dir的元数据信息，填充账户信息、时间戳信息，设置硬链接数为1，设置i_number为++cur_inumber，设置文件类型为目录。最后将{par_inumber}/{dir_name}->{i_number}:{fs_inode}写入LevelDB中。Key points: First, parse the path into the parent directory par_path and the subdirectory file dir_name, find the number par_inumber of the parent node according to the par_path, and extract the meta information of the parent node at the same time, combine the account information to determine whether the user has the permission to create files in the par_path directory , if not return an error. If you have permission, assign i_number=cur_inumber+1 to the directory file, create an fs_inode data structure as the metadata information of new_dir, fill in account information, timestamp information, set the number of hard links to 1, set i_number to ++cur_inumber, set The file type is directory. Finally, write {par_inumber}/{dir_name}->{i_number}:{fs_inode} into LevelDB.

(2)fs_readdir(2) fs_readdir

功能：列出存放在指定路径上的目录项列表。这条命令对应于linux终端的ls指令。Function: List the directory items stored in the specified path. This command corresponds to the ls command in the Linux terminal.

要点：首先，根据path打开目录文件(具体过程请参考fs_open)，如果打开成功则取出目录节点的i_number；然后，使用LevelDB中的Iterator遍历key在range[i_number+“/”,i_number+1)中的key-value记录；最后将各条记录的filename填充入filler列表中，返回。Key points: First, open the directory file according to the path (refer to fs_open for the specific process), and if the opening is successful, take out the i_number of the directory node; then, use the Iterator in LevelDB to traverse the key in range[i_number+"/", i_number+1) key-value record; finally fill the filename of each record into the filler list and return.

(3)fs_rmdir(3) fs_rmdir

功能：将指定路径上的目录删除。Function: Delete the directory on the specified path.

要点：首先，同样从path中解析出父目录par_path和子目录文件dir_name，根据par_path找到父节点的编号par_inumber、meta，判断删除权限；如果有权限，则只需向LevelDB中写入记录{par_inumber}/{dir_name}->Delete即可。Key points: First, the parent directory par_path and the subdirectory file dir_name are also parsed from the path, and the number par_inumber and meta of the parent node are found according to the par_path, and the deletion permission is judged; if there is permission, just write the record {par_inumber}/ {dir_name}->Delete is enough.

多个文件操作函数包括：文件重命名函数fs_rename、文件打开函数fs_open、文件读取函数fs_read、文件写入函数fs_write、文件大小设置函数fs_truncate、文件权限修改函数fs_chmod、文件账户信息修改函数fs_chown、文件系统信息读取函数fs_statvfs、文件时间戳更新函数fs_utimens、指向target符号链接的文件创建函数fs_symlink、inumber路径获取函数get_disk_path和磁盘文件打开函数open_disk_file。Multiple file operation functions include: file rename function fs_rename, file open function fs_open, file read function fs_read, file write function fs_write, file size setting function fs_truncate, file permission modification function fs_chmod, file account information modification function fs_chown, file The system information reading function fs_statvfs, the file timestamp update function fs_utimens, the file creation function fs_symlink pointing to the target symbolic link, the inumber path obtaining function get_disk_path and the disk file opening function open_disk_file.

(4)fs_rename(4)fs_rename

功能：将指定路径下的文件改名。Function: Rename the file under the specified path.

要点：首先，同样是读父目录，判断是否对父目录具有写的权限(这里就不再赘述了)，取出了par_inumber；然后，读出{par_path}/{old_name}的Value信息，记为r_value；最后向LevelDB中写入记录{par_inumber}/{old_name}->Delete和{par_inumber}/{old_name}->{r_value}即可。Key points: First, read the parent directory as well, determine whether it has write permission to the parent directory (not repeated here), and take out the par_inumber; then, read the Value information of {par_path}/{old_name}, and record it as r_value ; Finally, write records {par_inumber}/{old_name}->Delete and {par_inumber}/{old_name}->{r_value} to LevelDB.

(5)fs_open(5) fs_open

功能：将指定路径上的文件打开。Function: Open the file on the specified path.

要点：首先，读父目录，获得par_inumber和par_meta，根据par_meta检查权限；然后将key为{par_inumber}/{filename}的记录项fs_record存入高速缓存中，创建一个fs_openfile数据结构，填充fs_record的地址指针、使用者账户信息等，设置文件当前位置为0，设置文件在r_value中的开始位置，并设置fs_record中的r_count为1，文件成功打开。Key points: First, read the parent directory, get par_inumber and par_meta, check permissions according to par_meta; then store the record item fs_record with key {par_inumber}/{filename} in the cache, create a fs_openfile data structure, and fill the address pointer of fs_record , user account information, etc., set the current position of the file to 0, set the start position of the file in r_value, and set the r_count in fs_record to 1, the file is successfully opened.

(6)fs_read(6) fs_read

功能：读取指定路径上的文件的指定位置上的内容。Function: Read the content in the specified location of the file on the specified path.

要点：在读取一个文件的内容前，首先需使用fs_open打开文件，这时文件内容已经在内存中，由于LevelDB每条记录的数据量较小，在取文件meta的同时将文件的data也一并取出。若文件属于大文件，则通过get_disk_path和open_disk_file打开位于本地文件系统中的大文件。无论是大文件还是小文件，最后均通过fs_openfile数据结构中的f_start和f_pos进行数据的读取工作即可。Point: Before reading the content of a file, you need to use fs_open to open the file. At this time, the content of the file is already in the memory. Since the amount of data in each record of LevelDB is small, the data of the file is also changed when the file meta is retrieved. and take out. If the file is a large file, open the large file located in the local file system through get_disk_path and open_disk_file. Whether it is a large file or a small file, the data can be read through f_start and f_pos in the fs_openfile data structure.

(7)fs_write(7) fs_write

功能：往指定路径上的文件的指定位置写数据。Function: Write data to the specified location of the file on the specified path.

要点：与fs_read一样，在写文件内容前，首先使用fs_open打开文件；然后将fs_openfile中的f_pos调整到指定的位置，开始数据的写入；随着数据的写入，f_pos也不断往后移动。需要注意的是，在每个写操作之前需要检查写完后文件的总大小，如果文件总大小超出了阈值，则将原本存储于LevelDB中的文件内容迁移到本地文件系统中。Important: Like fs_read, before writing the file content, first use fs_open to open the file; then adjust f_pos in fs_openfile to the specified position to start data writing; as data is written, f_pos also keeps moving backwards. It should be noted that the total file size after writing needs to be checked before each write operation. If the total file size exceeds the threshold, the file content originally stored in LevelDB will be migrated to the local file system.

(8)fs_truncate(8) fs_truncate

功能：设定指定路径上的文件的新大小。Function: Set the new size of the file on the specified path.

要点：获取文件的元数据信息之后，修改相应的size信息，以及文件的时间戳。需要注意的是，若truncate之前文件size小于threshold，而truncate之后文件size大于threshold，则需要将LevelDB中的小文件迁移至本地文件系统中，以大文件的形式进行存储，反之则不然。Important: After obtaining the metadata information of the file, modify the corresponding size information and the timestamp of the file. It should be noted that if the size of the file before truncate is smaller than the threshold, and the size of the file after truncate is larger than the threshold, you need to migrate the small files in LevelDB to the local file system and store them in the form of large files, and vice versa.

(9)fs_chmod(9) fs_chmod

功能：修改指定路径上的文件的新权限标志位。Function: Modify the new permission flag of the file on the specified path.

要点：通过父目录获取文件的元数据信息，读取其中的stat，然后将stat.st_mode更改为新的mode，最后将修改后的记录项写回即可。Point: Obtain the metadata information of the file through the parent directory, read the stat in it, then change the stat.st_mode to the new mode, and finally write the modified record item back.

(10)fs_chown(10) fs_chown

功能：修改指定路径上的文件的账户信息。Function: Modify the account information of the file on the specified path.

要点：与fs_chmod类似，首先通过父目录获取文件的元数据信息，读取其中的stat，然后设置stat.st_uid＝uid以及stat.st_gid＝gid，最后将修改后的纪录项写回即可。Important: Similar to fs_chmod, first obtain the metadata information of the file through the parent directory, read the stat in it, then set stat.st_uid=uid and stat.st_gid=gid, and finally write the modified record item back.

(11)fs_statvfs(11) fs_statvfs

功能：读取文件系统信息，如已使用空间、自由空间、空闲block树，存入statvfs中，这是本地文件系统的统计信息，直接调用本地文件系统的statvfs(path,stbuf)接口即可。Function: Read file system information, such as used space, free space, and free block tree, and store it in statvfs. This is the statistical information of the local file system. You can directly call the statvfs(path, stbuf) interface of the local file system.

(12)fs_utimens(12) fs_utimens

功能：更新指定路径上的文件的时间戳信息。Function: Update the timestamp information of the file on the specified path.

要点：与fs_chmode类似，首先通过父目录获取文件的元数据信息，读取其中的stat，然后设置新的st_atim、st_mtim，而对于st_ctim，由于创建时间是操作系统管理的，所以不允许用户进行修改。Point: Similar to fs_chmode, first obtain the metadata information of the file through the parent directory, read the stat in it, and then set new st_atim and st_mtim, and for st_ctim, since the creation time is managed by the operating system, users are not allowed to modify it .

(13)fs_symlink(13) fs_symlink

功能：在指定路径上创建一个指向target符号链接的文件Function: Create a file pointing to the target symbolic link on the specified path

要点：首先通过父目录获取文件的fs_inode，将所链接到的target字符串存入value中，设置is_large＝false，将生成的path记录写回即可。Point: First, obtain the fs_inode of the file through the parent directory, store the linked target string in the value, set is_large=false, and write the generated path record back.

(14)get_disk_path(14) get_disk_path

功能：获得inumber所对应的路径Function: Get the path corresponding to inumber

要点：非对外接口，主要是根据inumber计算所对应的索引路径，对于大文件的映射具有重要的作用。具体地，对于大文件的映射规则例如包括：Main point: The non-external interface mainly calculates the corresponding index path according to the inumber, which plays an important role in the mapping of large files. Specifically, the mapping rules for large files include, for example:

首先，获得LevelFS为文件分配的64位正整数i_number，需要观察它的特点。虽然i_number范围为64位，但是它是从0开始不断往上增长的，并不像内存地址那样是随机分配的。这也就决定了当文件系统规模比较小时，i_number的可能只有较低的若干位有数据，而大部分高位都为0；当文件系统逐渐增大时，i_number数据也越来越大，有效数据的位数也越来越多。First, to obtain the 64-bit positive integer i_number assigned to the file by LevelFS, you need to observe its characteristics. Although the range of i_number is 64 bits, it starts from 0 and increases continuously, and is not randomly assigned like a memory address. This also determines that when the size of the file system is relatively small, only the lower bits of i_number may have data, while most of the high bits are 0; The number of digits is also increasing.

然后，将这64位正整数从低位开始以13位为一组进行分组，共有5组数字，命名为p0、p1、p2、p3、p4，除p4不满13位外，其余各组数字的范围都是0～2^13–1。Then, the 64-bit positive integers are grouped in groups of 13 bits from the low order, there are 5 groups of numbers, named p0, p1, p2, p3, p4, except that p4 is less than 13 bits, the range of the other groups of numbers All are 0 to 2^13–1.

最后，LevelDB使用如下规则进行映射：Finally, LevelDB uses the following rules for mapping:

(1)如果i_number<2²⁶(也即8192²)，那么映射路径为/{p1}/{p0}，例如i_number＝9000，那么它所对应的存储路径就为/1/808；(1) If i_number<2 ²⁶ (that is, 8192 ² ), then the mapping path is /{p1}/{p0}, for example, i_number=9000, then its corresponding storage path is /1/808;

(2)如果i_number>＝2²⁶，且i_number<2³⁹，那么映射路径为/a/{p2}/{p1}/{p0}；(2) If i_number>=2 ²⁶ and i_number<2 ³⁹ , then the mapping path is /a/{p2}/{p1}/{p0};

(3)如果i_number>＝2³⁹，且i_number<2⁵²，则映射路径为/b/{p3}/{p2}/{p1}/{p0}；(3) If i_number>=2 ³⁹ and i_number<2 ⁵² , the mapping path is /b/{p3}/{p2}/{p1}/{p0};

(4)如果i_number>＝2⁵²，且i_number<2⁶⁴，则映射路径为/c/{p4}/{p3}/{p2}/{p1}/{p0}。(4) If i_number>=2 ⁵² and i_number<2 ⁶⁴ , the mapping path is /c/{p4}/{p3}/{p2}/{p1}/{p0}.

需要说明的是，LevelFS之所以采用上述的映射规则，主要有如下几点原因：It should be noted that the reason why LevelFS adopts the above mapping rules is mainly due to the following reasons:

首先，当一个目录中的目录项数量过大的时候，对其按文件名进行搜索的速度就会变的很慢。因此，LevelFS采用分级的思想，以13bit为一组，将每个目录节点中的目录项数控制在2¹³数量级以内。First, when the number of directory entries in a directory is too large, the speed of searching it by file name will become very slow. Therefore, LevelFS adopts the idea of grading, takes ¹³ bits as a group, and controls the number of directory entries in each directory node within the order of magnitude of 213.

然后，如果直接采用多级索引的方式，每13位建立一级索引，那么检索到一个文件需要5级，即使是在文件系统数据量很小，检索123号文件同样需要5级索引，这样开销比较大，也没有必要。因此，LevelFS采用非平衡的方式，当文件系统规模比较小时，就使用/{p1}/{p0}的两层索引方式，比起直接的多级索引能减少总的索引次数。Then, if the multi-level index method is directly adopted, and a level-1 index is established for every 13 bits, then it takes 5 levels to retrieve a file. Even if the amount of data in the file system is small, 5 levels of indexes are also required to retrieve the 123rd file, which is an overhead. It's bigger, and it's not necessary. Therefore, LevelFS adopts an unbalanced method. When the size of the file system is relatively small, the two-level indexing method of /{p1}/{p0} is used, which can reduce the total number of indexes compared with the direct multi-level indexing.

最后，当i_number很大了之后，超过了10⁸，说明文件系统的规模已经很大了，这时再根据i_number的规模决定是去搜索/a/路径还是/b/路径还是/c/路径。Finally, when i_number is very large, it exceeds 10 ⁸ , indicating that the scale of the file system is very large. At this time, according to the scale of i_number, it is decided whether to search the /a/ path, the /b/ path or the /c/ path.

采用这种非平衡多级索引的方式，能很好符合inode号顺序增长的特点，使得文件系统无论在较小规模还是较大规模都能有合适的索引级数，尽可能地降低平均索引次数；同时，又充分限制了每个目录中的目录项数量，减少每个目录中的索引时间。The use of this unbalanced multi-level index method can well meet the characteristics of the sequential growth of inode numbers, so that the file system can have a suitable index level no matter in a small scale or a large scale, and reduce the average number of indexes as much as possible. ; At the same time, it fully limits the number of directory entries in each directory and reduces the indexing time in each directory.

(15)open_disk_file(15) open_disk_file

功能：打开fs_inode_header*iheader对象的磁盘文件。Function: Open the disk file of the fs_inode_header*iheader object.

要点：非对外接口，根据iheader.st_ino调用get_disk_path获取文件路径，然后配合flags调用本地文件系统的open函数，记录打开文件信息。Point: For non-external interfaces, call get_disk_path to obtain the file path according to iheader.st_ino, and then call the open function of the local file system with flags to record the open file information.

采用哈希映射函数向构建的基于LSM-Tree结构的日志文件系统中添加记录数据，包括如下步骤：Adding record data to the constructed log file system based on the LSM-Tree structure using the hash mapping function includes the following steps:

图5是本发明实施例所述添加记录x、y数据结构图，如图5所示：Fig. 5 is the data structure diagram of adding record x, y described in the embodiment of the present invention, as shown in Fig. 5:

假设使用k个哈希映射函数，能够分别将键映射到[0，m-1]之间的k个数。当需要写入一条记录的时候，通过映射找到对应的k个数，然后将字节数组中这k个对应位置中的数都加1，表明系统中存在这样一条记录。举例来说，假设k＝3，写入记录x和y，hashs(x)＝[2,4,7]，hashs(y)＝[4,8,11]，此时的数组变为图5所示。Assuming that k hash mapping functions are used, keys can be mapped to k numbers between [0, m-1] respectively. When a record needs to be written, the corresponding k numbers are found through mapping, and then the numbers in the k corresponding positions in the byte array are incremented by 1, indicating that such a record exists in the system. For example, suppose k=3, write records x and y, hashs(x)=[2,4,7], hashs(y)=[4,8,11], the array at this time becomes Figure 5 shown.

采用哈希映射函数向构建的基于LSM-Tree结构的日志文件系统中查询记录数据，包括如下步骤：Using the hash mapping function to query the record data in the log file system based on the LSM-Tree structure, the following steps are included:

图6是本发明实施例所述查询p、q操作流程图，如图6所示，当需要读记录时，首先通过hashs找到对应位置，判断各位置上的值是否都大于0。如果hashs(p)＝[1,4,11]，查数组发现1位置上对应的数为0，则记录p一定不在LevelDB中，不需要继续往下找，返回读失败；如果hashs(q)＝[4,8,11]，各位置上的数均大于0，则记录q有可能在LevelDB中，q的读取不能被过滤掉。FIG. 6 is a flow chart of the query p, q operation according to the embodiment of the present invention. As shown in FIG. 6, when a record needs to be read, the corresponding position is first found through hashs, and it is judged whether the value of each position is greater than 0. If hashs(p)=[1,4,11], check the array and find that the corresponding number at the 1 position is 0, then the record p must not be in LevelDB, no need to continue to search down, and return to read failure; if hashs(q) = [4, 8, 11], the numbers at each position are greater than 0, then the record q may be in LevelDB, and the reading of q cannot be filtered out.

在步骤S3之后，还包括如下步骤：删除记录数据，则将数组中该条记录对应位置上的数都减1。After step S3, it also includes the following steps: delete the record data, then decrement the number at the corresponding position of the record in the array by 1.

图7是本发明实施例所述删除记录数据结构图，如图7所示，当需要删除一条记录时，则将数组中该条记录对应位置上的数都减1，删除记录x之后的数组内容如图7所示。Fig. 7 is the data structure diagram of the deleted record according to the embodiment of the present invention. As shown in Fig. 7, when a record needs to be deleted, the number at the corresponding position of the record in the array is decremented by 1, and the array after the record x is deleted The content is shown in Figure 7.

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不一定指的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, description with reference to the terms "one embodiment," "some embodiments," "example," "specific example," or "some examples", etc., mean specific features described in connection with the embodiment or example , structure, material or feature is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

尽管上面已经示出和描述了本发明的实施例，可以理解的是，上述实施例是示例性的，不能理解为对本发明的限制，本领域的普通技术人员在不脱离本发明的原理和宗旨的情况下在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。本发明的范围由所附权利要求极其等同限定。Although the embodiments of the present invention have been shown and described above, it should be understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and those of ordinary skill in the art will not depart from the principles and spirit of the present invention Variations, modifications, substitutions, and alterations to the above-described embodiments are possible within the scope of the present invention without departing from the scope of the present invention. The scope of the invention is defined by the appended claims, with their full equivalents.

Claims

1. a kind of construction method of the log file system based on LSM-Tree structure, which comprises the steps of:

Step S1 constructs the log file system fuse framework interface based on LSM-Tree structure, includes the following steps:

Step S11 calls fuse_main () function that fuse is file system mounted on mount point, and creation UNIX is locally socketed Word creates and runs subprocess fusermount, and then calling fuse_new () function is that fuse file system distribution data are deposited Space is stored up, carry is completed；

Step S12, after completing carry, fuse_main () function call fuse_loop () opens conversation modes, provides a user Conversational services；

Fuse file system is unloaded using fusermount-uPATH order, then interrupts the conversational services by step S13, is returned Receive corresponding memory space；

Step S2 constructs the multiple directory operation functions and file operation letter of the log file system based on LSM-Tree structure Number；

Step S3 is added and is inquired into the log file system based on LSM-Tree structure of building using Hash mapping function Record data, comprising: set using k Hash mapping function, key is mapped to the k number between [0, m-1] respectively, when needs are write When entering a record, corresponding k number is found by mapping, then by the number in byte arrays in this k corresponding position All plus 1, show that there are such one records in system.

2. the construction method of the log file system as described in claim 1 based on LSM-Tree structure, which is characterized in that institute State multiple directory operation functions include: directory creating function fs_mkdir, catalogue storage list function fs_readdi, catalogue is deleted Except function fs_rmdir；

Multiple file manipulation functions include: file renaming function fs_rename, File Open function fs_open, file reading Function fs_truncate, file permission Modification growth function is arranged in function fs_read, file write-in function fs_write, file size Fs_chmod, file account information Modification growth function fs_chown, filesystem information function reading fs_statvfs, document time Stamp renewal function fs_utimens, the path document creation function fs_symlink, inumber for being directed toward target Symbolic Links are obtained Function get_disk_path and disk file is taken to open function open_disk_file.

3. the construction method of the log file system as described in claim 1 based on LSM-Tree structure, which is characterized in that It is described to be inquired using Hash mapping function into the log file system based on LSM-Tree structure of building in the step S3 Data are recorded, are included the following steps:

The corresponding position that record data are found by k Hash mapping function, judges whether the value on k position is both greater than 0, such as Fruit is then to read the record data.

4. the construction method of the log file system as described in claim 1 based on LSM-Tree structure, which is characterized in that Further include following steps after the step S3: deletion record data, and this in byte arrays is recorded on corresponding position Number all subtracts 1.