+

CN106570113B - Mass vector slice data cloud storage method and system - Google Patents

Mass vector slice data cloud storage method and system Download PDF

Info

Publication number
CN106570113B
CN106570113B CN201610939884.6A CN201610939884A CN106570113B CN 106570113 B CN106570113 B CN 106570113B CN 201610939884 A CN201610939884 A CN 201610939884A CN 106570113 B CN106570113 B CN 106570113B
Authority
CN
China
Prior art keywords
slice data
vector slice
massive
massive vector
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610939884.6A
Other languages
Chinese (zh)
Other versions
CN106570113A (en
Inventor
马潇
王景朝
费香泽
王宪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Electric Power Research Institute Co Ltd CEPRI
State Grid Anhui Electric Power Co Ltd
State Grid Corp of China SGCC
Original Assignee
China Electric Power Research Institute Co Ltd CEPRI
State Grid Anhui Electric Power Co Ltd
State Grid Corp of China SGCC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Electric Power Research Institute Co Ltd CEPRI, State Grid Anhui Electric Power Co Ltd, State Grid Corp of China SGCC filed Critical China Electric Power Research Institute Co Ltd CEPRI
Priority to CN201610939884.6A priority Critical patent/CN106570113B/en
Publication of CN106570113A publication Critical patent/CN106570113A/en
Application granted granted Critical
Publication of CN106570113B publication Critical patent/CN106570113B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/134Distributed indices
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种用于海量矢量切片数据的云存储方法,所述方法包括:建立分布式文件系统目录树文件;建立与分布式文件系统目录树对应的所有元数据节点;将分布式文件系统中同一级目录下的海量矢量切片数据进行聚合,生成海量矢量切片数据包;将所述海量矢量切片数据包存储于所述元数据节点中;为所述海量矢量切片数据建立索引,所述海量矢量切片数据通过索引建立关联,形成网状结构的海量矢量切片数据的数据索引表;所述索引表用于记录所述海量矢量切片数据在所述海量矢量切片数据包中的路径;通过所述海量矢量切片数据包索引表提供所述海量矢量切片数据索引服务。

Figure 201610939884

The invention discloses a cloud storage method for massive vector slice data. The method includes: establishing a distributed file system directory tree file; establishing all metadata nodes corresponding to the distributed file system directory tree; The massive vector slice data under the same level directory in the system is aggregated to generate massive vector slice data packets; the massive vector slice data packets are stored in the metadata node; an index is established for the massive vector slice data, the The massive vector slice data is associated through the index, forming a data index table of the massive vector slice data in the mesh structure; the index table is used to record the path of the massive vector slice data in the massive vector slice data package; The massive vector slice data packet index table provides the massive vector slice data indexing service.

Figure 201610939884

Description

Mass vector slice data cloud storage method and system
Technical Field
The invention relates to the field of mass data storage, in particular to a mass vector slice data cloud storage method and system.
Background
With the continuous development of science and technology, the era of mass data has come. Therefore, how to optimize the load of the file system, and improving the load balance becomes an important requirement at present. When the size of a data set exceeds the storage capacity of a single physical computer, it is necessary to partition it and store it on several separate computers. The international companies such as google, amazon, IBM and microsoft invest a great deal of scientific research power in the field, and various innovative mass data management technologies are provided. Research work is currently focused on 3 levels, the storage layer, the computation layer and the interface layer. The Hadoop project in the prior art realizes Hadoop distributed file system Hadoop DFS (HDFS for short) and parallel programming framework Hadoop MapReduce. The distributed file system is built on a network, and complexity of network programming is introduced, so that the distributed file system is more complex than a common disk file. The goal of distributed file systems is to achieve resource sharing, so that programs operate on remote files like storing and accessing in a manner similar to accessing local files, which are typically represented by the Google file system GFS, Hadoop file system HDFS, dynamo, TFS, etc. Present distributed file systems typically maintain nearly the same access interface and object model as local file systems, primarily to provide backward compatibility to users.
The prior art mainly adopts a distributed file system to store and read data files with super-large levels (the file size is hundreds of MB, GB or TB). However, the distributed file system based on a large amount of small file data cannot meet the storage requirement of the large amount of small file data due to the low storage speed. At present, no technical scheme for storing and reading a large amount of small file data based on a distributed file system exists.
Disclosure of Invention
In order to solve the speed problem when a large amount of small file data are stored based on a distributed file system, the invention provides a method, which comprises the following steps:
establishing all metadata nodes corresponding to a directory tree of the distributed file system;
the method comprises the steps that massive vector slice data under the same-level directory in a distributed file system are aggregated to generate a massive vector slice data packet;
storing the massive vector slice data packets in the metadata nodes;
establishing indexes for the massive vector slice data, and establishing association of the massive vector slice data through the indexes to form a data index table of the massive vector slice data with a mesh structure;
and providing the massive vector slice data index service through the massive vector slice data packet index table.
Preferably, the method according to claim 1, the method comprising:
the mass vector slice data index comprises a mass vector slice data path, a name and an offset in the mass vector slice data packet;
the massive vector slice data path comprises element node positions, massive vector slice data row positions and massive vector slice data column positions.
Preferably, the method comprises:
presetting metadata nodes on each layer, and storing an index table into the preset metadata nodes on each layer;
and transmitting the massive vector slice data index table stored in the metadata to a client, and establishing a massive vector slice data index table persistent mapping table.
Preferably, the massive vector slice data packet comprises a file header and at least one record;
the file header comprises a file type, a version number, file keywords, a file name and a position corresponding to each record;
each record corresponds to a vector slice data, and each record comprises a length, a key, and a value of the vector slice data.
Preferably, the massive vector slice data packets are stored by a data file serialization method.
Preferably, the method further comprises the following steps: and performing additional storage at the tail part of the massive vector slice data packet.
Preferably, the method comprises: and caching the massive vector slice data index table to a client, and reducing the number of times of accessing the metadata node so as to improve the number of times of accessing massive vector slice data.
Preferably, the method further comprises the following steps: the method for reading the massive vector slice data comprises the following steps:
determining the shortest path of the metadata node corresponding to the massive vector slice data packet through the massive vector slice data index table;
and determining the position of the vector slice data in a file header in a data packet file in the determined metadata node.
Based on the implementation mode of the present invention, the present invention provides a cloud storage system for massive vector slice data, the system comprising:
the first generation unit is used for establishing a directory tree file of the distributed file system;
the second generation unit is used for establishing all metadata nodes corresponding to the directory tree of the distributed file system;
the aggregation unit is used for aggregating the massive vector slice data based on the same-level directory of the distributed file system to generate a massive vector slice data packet;
the storage unit is used for storing the massive vector slice data packets in the metadata nodes;
a third generating unit, configured to generate the massive vector slice data index table, establish a mesh structure of the massive vector slice data packet through the index table, and record a path of the massive vector slice data in the massive vector slice data packet;
and the indexing unit is used for providing the massive vector slice data indexing service through the massive vector slice data indexing.
The invention has the beneficial effects that: and aggregating the massive vector slice data under the same-level directory in the distributed file system to generate a massive vector slice data packet, so that the massive vector slice data can be rapidly stored. And simultaneously, establishing indexes for the massive vector slice data, and establishing association of the massive vector slice data through the indexes to form a data index table of the massive vector slice data with a mesh structure. Through a data index table of a network structure, the corresponding metadata node is found through the shortest path, and the access speed of data is accelerated.
Drawings
A more complete understanding of exemplary embodiments of the present invention may be had by reference to the following drawings in which:
FIG. 1 is a system flow chart of a mass vector slice data cloud storage method according to an embodiment of the present invention; and
fig. 2 is a system structure diagram of a cloud storage method for massive vector slice data according to an embodiment of the present invention.
Detailed Description
The exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, however, the present invention may be embodied in many different forms and is not limited to the embodiments described herein, which are provided for complete and complete disclosure of the present invention and to fully convey the scope of the present invention to those skilled in the art. The terminology used in the exemplary embodiments illustrated in the accompanying drawings is not intended to be limiting of the invention. In the drawings, the same units/elements are denoted by the same reference numerals.
Unless otherwise defined, terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Further, it will be understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense.
Fig. 1 is a system flow chart of a mass vector slice data cloud storage method according to an embodiment of the present invention. The invention provides a massive vector slice data storage method based on a distributed file system. The scheme of the invention is based on the directory tree structure of the existing distributed file system, a plurality of massive vector slice data in a directory are packaged into massive vector slice data packets for storage, the packaged massive vector slice data packets are large data files, and the file level is more than one hundred MB. Meanwhile, the technical scheme of the invention generates the mass vector slice data to establish the index, records the path of the mass vector slice data in the mass vector slice data packet, and provides an interface for the client to access the mass vector slice data. The method of the invention is fully used for the advantages of high fault tolerance, expandability and distributivity of the master-slave distributed file system, and realizes the high-efficiency storage of massive vector data on the basis of the distributed file system with the file-oriented level exceeding one hundred MB. The method provided by the invention uses the distributed file system to store the massive vector data, and simultaneously establishes the index for the massive vector data, thereby solving the problem of low speed of storing the massive vector data at present and improving the access speed by establishing the index.
Preferably, the method 100 starts from step 101: and establishing a directory tree file of the distributed file system. The method has the advantages that the directory tree structure file of the distributed file system is constructed, and the advantages of high fault tolerance, expandability and distribution of the distributed file system can be fully utilized.
Preferably, step 102: all metadata nodes corresponding to the distributed file system directory tree are established. The metadata node is used for storing data.
Preferably, step 103: and aggregating the massive vector slice data under the same-level directory in the distributed file system to generate a massive vector slice data packet. And designing a file structure of the massive vector slice data packet, wherein the massive vector slice data packet comprises a file header and at least one record. The file header comprises a file type, a version number, a file keyword, a file name and a position corresponding to each record. Each record corresponds to a vector slice data, and each record includes a length, a key, and a value of the vector slice data. And the additional storage of the massive vector slice data is performed at the tail part of the massive vector slice data packet. And storing the massive vector slice data packets by adopting a data file serialization method. The implementation mode provided by the invention is based on a distributed system framework, and consists of a metadata node and a plurality of levels of hierarchical data nodes under the metadata node. The embodiment of the invention stores all the massive vector slice data under the same-level directory into the data file under the directory, and the massive vector slice data packet of the data file is a file in a distributed file system. In the embodiment of the invention, the key of the aggregation storage technology lies in the design of massive vector slice data packet files. The mass vector slice package file uses a distributed file system file of binary Key/Value (Key/Value) persistent data structure, which consists of a header and one or more subsequent records. The first three bytes of the file header of the massive vector slice data packet are the file type of SEQ, and the next byte represents the version number of the file data structure. The header also includes other fields including keys and names of the corresponding types of values. And directly adding the massive vector slice data at the tail part of the massive vector slice data packet file during storage. Each record represents a vector slice of data. The record is composed of four items of record length, key and value. Wherein the value of the key is the file name of the vector slice data and the value is the content of the vector slice data.
Preferably, step 104: and storing the massive vector slice data packets in the metadata nodes. The massive vector slice data packet storage method is realized based on a distributed file system, and the operation of massive vector slice data access depends on the distributed file system. And the additional storage of the massive vector slice data is performed at the tail part of the massive vector slice data packet. And storing the massive vector slice data packets by adopting a data file serialization method. When one client writes vector slice data to a certain directory, the client performs write operation on the data file of the directory, and the distributed file system records that the occupation permission Lease of the data file can be regarded as the write lock of the file. At this time, if another client also needs to store its own vector slice data in the same directory, it will also apply for writing the massive vector slice data packet file in the directory. Because a write lock already exists in the massive vector slice data packet file and the distributed file system does not realize the maintenance of the transaction request queue, the result of operation failure is directly returned to the client. From the perspective of users, creating different massive vector slice data packet files under the same directory does not cause conflict, but at the back end, the same massive vector slice data packet file is actually operated, and due to the locking mechanism, the problem that a plurality of users write conflict to different vector slice data under the same directory occurs. The realization of the massive vector slice data packet files mainly adopts a sequence and deserialization method of the data files. By serializing, it is meant that the structured object is converted into a byte stream for transmission over a network or written to disk for permanent storage. Deserialization refers to the reverse process of converting a byte stream into an object that will be structured.
Preferably, step 105: establishing indexes for the massive vector slice data, and establishing association of the massive vector slice data through the indexes to form a data index table of the massive vector slice data with a mesh structure; the index table is used for recording the path of the massive vector slice data in the massive vector slice data packet. The mass vector slice data index comprises a mass vector slice data path, a name and an offset in a mass vector slice data packet, and the mass vector slice data path comprises an element node position, a mass vector slice data row position and a mass vector slice data column position. For example, one of the massive vector slice data paths includes <18, 0506>, where 18 is a metadata node position, 05 is a massive vector slice data row position, and 06 is a massive vector slice data column position. When searching for the massive vector slice data, the corresponding row 05 is searched for again by locating the metadata node position 18, and then the corresponding column 06 is searched for again. And all the massive vector slice data form a spatial mesh index structure according to the metadata node positions of the paths in the index table, the row positions of the massive vector slice data and the column positions of the massive vector slice data. The embodiment of the invention can realize the shortest path searched by massive vector slice data.
And presetting a metadata node for storing a data index table for each layer of metadata node, and storing the massive vector slice data index table in the corresponding metadata node. And transmitting the massive vector slice data index table recorded in the metadata to a directory file, and establishing a massive vector slice data index persistent mapping table at the client.
The index of the vector slice data records the position of the vector slice data in the specific massive vector slice data packet file and other attributes of the vector slice data, and the vector slice data must be created for the massive vector slice data after the client stores the data. The index record comprises the name of the massive vector slice data, the file path of the massive vector slice data packet in which the massive vector slice data is positioned and the offset in the massive vector slice data packet file. The number of bits occupied by the file names of the massive vector slice data packets determines the number of data files in a directory, and the number of bits occupied by the offset determines the size of the data files, so that the capacity of storing data in a directory is limited.
Preferably, the massive vector slice data indexes are distributed to various data nodes for management. Although the index data of the massive vector slice data is huge, after the index data is distributed on the metadata nodes, the index data on a single metadata node is relatively small, and the capacity of the cluster for storing the massive vector slice data depends on the size of the cluster. The size of the cluster scale can not only determine the size of the storage capacity, but also reflect the size of the quantity of the massive vector slice data. The metadata node maintains an index of the vector slice data and provides an index service to the client. The index position of the vector slice data describes the metadata node that maintains the index of the vector slice data.
Preferably, the indexes of the massive vector slice data are classified according to the parent directory where the indexes are located, and the purpose of the indexes is to manage the massive vector slice data indexes in the same directory by the metadata nodes in the same level. In view of this feature, embodiments of the present invention create an index location mapping table to record the mapping relationship of directories to metadata nodes. The index location mapping table is managed by a metadata node. When a client queries massive vector slice data indexes, the client needs to know the position of a metadata node for maintaining the sea vector slice data indexes. The method comprises the steps of transmitting a path of massive vector slice data to a metadata node, and then searching an index position mapping table by the metadata node according to a father directory of a sea vector slice data path to find the position of the metadata node. The invention designs an index position maintenance module on the metadata node, which is specially used for distributing data nodes for a directory and maintaining an index position mapping table.
Preferably, the index location maintenance module selects and allocates to the directory according to all data nodes maintained by the metadata node. The index location mapping table is persisted to the local disk, and when the data of the index location mapping table changes, the contents of the index location mapping table on the local disk are updated again. If the index location maintenance module cannot find enough metadata nodes when distributing the metadata nodes to the directory, the module inserts the unallocated directory into a directory distribution waiting queue, the content of the queue is also persisted on a disk, and the queue needs to be updated on the disk once a new directory is added or deleted. When the metadata node is started, queue data on the disk needs to be read into a memory. The purpose of the queue is to wait for the index location maintenance module to reallocate the directory in the queue when a new data node is registered in the distributed file system. Also, each update of the queue needs to be persisted.
The embodiment of the invention maintains and manages the index of the vector slice data by designing the vector slice data index module on the data node, and provides index service for the client. The module maintains index records and index files in the memory and log files corresponding to the index files. The metadata nodes sort the index records with a B-tree to speed up lookup access of the index. The updating of the index record will firstly modify the memory data structure and temporarily asynchronously correspond to the index file. The updated content is recorded in a Log file corresponding to the index file, the index file is read into a memory according to the requirement after the data node is started, the index data structure is updated according to the Log, the index record in the memory is stored on the data node again at the moment to replace the old index file, and the Log is emptied. This is done to avoid the index data in memory being lost due to a sudden power off of the data node.
Preferably, the massive vector slice data index table is cached to the client, and the number of times of accessing the metadata node is reduced so as to improve the number of times of accessing massive vector slice data. According to the embodiment of the invention, the mass vector slice data indexes commonly used by the user are cached at the client, so that the access frequency of the client to the metadata node can be reduced, and the access efficiency of the mass vector slice data is improved.
Preferably, step 106: and providing a massive vector slice data index service through a massive vector slice data packet index table. And determining the shortest path of the metadata node corresponding to the massive vector slice data packet through the massive vector slice data index table. And determining the position of the vector slice data in the file header in the data packet file in the determined metadata node.
Fig. 2 is a system structure diagram of a cloud storage method for massive vector slice data according to an embodiment of the present invention. The system 200 includes:
a first generating unit 201, configured to establish a directory tree file of a distributed file system;
a second generating unit 202, configured to establish all metadata nodes corresponding to the directory tree of the distributed file system;
the aggregation unit 203 is configured to aggregate the massive vector slice data based on the same-level directory of the distributed file system to generate a massive vector slice data packet;
a storage unit 204, configured to store the massive vector slice data packets in the metadata node;
a third generating unit 205, configured to generate a massive vector slice data index table, establish a mesh structure of a massive vector slice data packet through the index table, and record a path of massive vector slice data in the massive vector slice data packet;
and the indexing unit 206 is configured to provide a massive vector slice data indexing service through massive vector slice data indexing.
The mass vector slice data cloud storage method system 200 according to the embodiment of the present invention corresponds to the mass vector slice data cloud storage method system 100 according to another embodiment of the present invention, and details thereof are not repeated herein.
The invention has the beneficial effects that: and aggregating the massive vector slice data under the same-level directory in the distributed file system to generate a massive vector slice data packet, so that the massive vector slice data can be rapidly stored. And simultaneously, establishing indexes for the massive vector slice data, and establishing association of the massive vector slice data through the indexes to form a data index table of the massive vector slice data with a mesh structure. Through a data index table of a network structure, the corresponding metadata node is found through the shortest path, and the access speed of data is accelerated.
The invention has been described with reference to a few embodiments. However, other embodiments of the invention than the one disclosed above are equally possible within the scope of the invention, as would be apparent to a person skilled in the art from the appended patent claims.
Generally, all terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to "a/an/the [ device, component, etc ]" are to be interpreted openly as referring to at least one instance of said device, component, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless explicitly stated.
In addition, as will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Claims (6)

1.一种用于海量矢量切片数据的云存储方法,所述方法包括:1. A cloud storage method for massive vector slice data, the method comprising: 建立与分布式文件系统目录树对应的所有元数据节点;Establish all metadata nodes corresponding to the distributed file system directory tree; 将分布式文件系统中同一级目录下的海量矢量切片数据进行聚合,生成海量矢量切片数据包;所述海量矢量切片数据包包括文件头及至少一条记录;Aggregating massive vector slice data under the same level directory in the distributed file system to generate massive vector slice data packets; the massive vector slice data packets include a file header and at least one record; 所述文件头包括文件类型、版本号、文件关键字、文件名称,每条所述记录对应的位置;The file header includes file type, version number, file keyword, file name, and the position corresponding to each of the records; 每条所述记录对应一个矢量切片数据,所述每条记录包括矢量切片数据的长度、键长度、键以及值;Each of the records corresponds to a vector slice data, and each record includes the length, key length, key and value of the vector slice data; 将所述海量矢量切片数据包存储于所述元数据节点中,包括所述海量矢量切片数据包采用数据文件序列化方法进行存储;storing the massive vector slice data packets in the metadata node, including storing the massive vector slice data packets by using a data file serialization method; 为所述海量矢量切片数据建立索引,所述海量矢量切片数据通过索引建立关联,形成网状结构的海量矢量切片数据的数据索引表;establishing an index for the massive vector slice data, and the massive vector slice data is associated through the index to form a data index table of the massive vector slice data of the mesh structure; 通过所述海量矢量切片数据包索引表提供所述海量矢量切片数据索引服务;providing the massive vector slice data indexing service through the massive vector slice data packet index table; 对海量矢量切片数据进行读取的方法:The method of reading massive vector tile data: 通过所述海量矢量切片数据索引表确定所述海量矢量切片数据包对应的所述元数据节点最短路径;Determine the shortest path of the metadata node corresponding to the massive vector slice data package by using the massive vector slice data index table; 通过在确定的元数据节点中数据包文件中的文件头中,确定所述矢量切片数据的位置。The position of the vector slice data is determined by the file header in the data packet file in the determined metadata node. 2.根据权利要求1所述的方法,所述方法包括:2. The method of claim 1, comprising: 所述海量矢量切片数据索引包括所述海量矢量切片数据路径、名称以及在所述海量矢量切片数据包中的偏移量;The massive vector slice data index includes the massive vector slice data path, name and offset in the massive vector slice data packet; 所述海量矢量切片数据路径包括元结点位置、海量矢量切片数据行位置以及海量矢量切片数据列位置。The massive vector slice data path includes a meta node location, a massive vector slice data row location, and a massive vector slice data column location. 3.根据权利要求1所述的方法,所述方法包括:3. The method of claim 1, comprising: 每一层预设一元数据结点,将索引表存入预先设计的每一层的元数据结点;Each layer is preset with a metadata node, and the index table is stored in the pre-designed metadata node of each layer; 将所述元数据中存储的所述海量矢量切片数据索引表传输至客户端,建立海量矢量切片数据索引表持久映射表。The massive vector slice data index table stored in the metadata is transmitted to the client, and a persistent mapping table of the massive vector slice data index table is established. 4.根据权利要求1所述的方法,还包括:在所述海量矢量切片数据包的尾部进行追加存储。4. The method according to claim 1, further comprising: performing additional storage at the tail of the massive vector slice data packet. 5.根据权利要求1所述的方法,所述方法包括:将所述海量矢量切片数据索引表缓存至客户端。5. The method of claim 1, comprising: caching the massive vector tile data index table to a client. 6.一种用于海量矢量切片数据的云存储系统,所述系统包括:6. A cloud storage system for massive vector slice data, the system comprising: 第一生成单元,用于建立分布式文件系统目录树文件;a first generating unit, used for establishing a distributed file system directory tree file; 第二生成单元,用于建立与分布式文件系统目录树对应的所有元数据节点;The second generation unit is used to establish all metadata nodes corresponding to the distributed file system directory tree; 聚合单元,用于将基于分布式文件系统同一级目录下的海量矢量切片数据进行聚合,生成海量矢量切片数据包;所述海量矢量切片数据包包括文件头及至少一条记录;an aggregation unit for aggregating massive vector slice data based on the same level directory of the distributed file system to generate massive vector slice data packets; the massive vector slice data packets include a file header and at least one record; 所述文件头包括文件类型、版本号、文件关键字、文件名称,每条所述记录对应的位置;The file header includes file type, version number, file keyword, file name, and the position corresponding to each of the records; 每条所述记录对应一个矢量切片数据,所述每条记录包括矢量切片数据的长度、键长度、键以及值;Each of the records corresponds to a vector slice data, and each record includes the length, key length, key and value of the vector slice data; 存储单元,用于将所述海量矢量切片数据包存储于所述元数据节点中,包括所述海量矢量切片数据包采用数据文件序列化方法进行存储;a storage unit, configured to store the massive vector slice data packets in the metadata node, including storing the massive vector slice data packets by using a data file serialization method; 第三生成单元,用于生成所述海量矢量切片数据索引表,通过索引表建立所述海量矢量切片数据包的网状结构,用于记录所述海量矢量切片数据在所述海量矢量切片数据包中的路径;The third generating unit is configured to generate the massive vector slice data index table, establish a mesh structure of the massive vector slice data packets through the index table, and record the massive vector slice data in the massive vector slice data packets path in; 索引单元,用于通过所述海量矢量切片数据索引提供所述海量矢量切片数据索引服务;对海量矢量切片数据进行读取的方法:An indexing unit, configured to provide the massive vector slice data index service through the massive vector slice data index; the method for reading the massive vector slice data: 通过所述海量矢量切片数据索引表确定所述海量矢量切片数据包对应的所述元数据节点最短路径;Determine the shortest path of the metadata node corresponding to the massive vector slice data package by using the massive vector slice data index table; 通过在确定的元数据节点中数据包文件中的文件头中,确定所述矢量切片数据的位置。The position of the vector slice data is determined by the file header in the data packet file in the determined metadata node.
CN201610939884.6A 2016-10-25 2016-10-25 Mass vector slice data cloud storage method and system Active CN106570113B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610939884.6A CN106570113B (en) 2016-10-25 2016-10-25 Mass vector slice data cloud storage method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610939884.6A CN106570113B (en) 2016-10-25 2016-10-25 Mass vector slice data cloud storage method and system

Publications (2)

Publication Number Publication Date
CN106570113A CN106570113A (en) 2017-04-19
CN106570113B true CN106570113B (en) 2022-04-01

Family

ID=58536334

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610939884.6A Active CN106570113B (en) 2016-10-25 2016-10-25 Mass vector slice data cloud storage method and system

Country Status (1)

Country Link
CN (1) CN106570113B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107291889A (en) * 2017-06-20 2017-10-24 郑州云海信息技术有限公司 A kind of date storage method and system
CN108172277B (en) * 2017-12-19 2020-07-07 浙江大学 Method and system for storing and browsing multiple-magnification digital slice image
CN109767274B (en) * 2018-12-05 2023-04-25 航天信息股份有限公司 Method and system for carrying out associated storage on massive invoice data
CN111459882B (en) * 2020-03-30 2023-08-29 北京百度网讯科技有限公司 Distributed file system namespace transaction processing method and device
CN111782663B (en) * 2020-05-21 2023-09-01 浙江邦盛科技股份有限公司 Aggregation index structure and aggregation index method for improving aggregation query efficiency
CN114564290A (en) * 2022-02-23 2022-05-31 中国农业银行股份有限公司 Vector data slicing method and device, electronic equipment and storage medium
CN115373645B (en) * 2022-10-24 2023-02-03 济南新语软件科技有限公司 Complex data packet operation method and system based on dynamic definition

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102332027A (en) * 2011-10-15 2012-01-25 西安交通大学 A method for associative storage of massive non-independent small files based on Hadoop
CN102385623A (en) * 2011-10-25 2012-03-21 曙光信息产业(北京)有限公司 Catalogue access method in DFS (distributed file system)
CN102541985A (en) * 2011-10-25 2012-07-04 曙光信息产业(北京)有限公司 Organization method of client directory cache in distributed file system
CN103020315A (en) * 2013-01-10 2013-04-03 中国人民解放军国防科学技术大学 Method for storing mass of small files on basis of master-slave distributed file system
CN103473287A (en) * 2013-08-30 2013-12-25 中国科学院信息工程研究所 Method and system for automatically distributing, running and updating executable programs
CN103577123A (en) * 2013-11-12 2014-02-12 河海大学 Small file optimization storage method based on HDFS
US8825652B1 (en) * 2012-06-28 2014-09-02 Emc Corporation Small file aggregation in a parallel computing system
CN104731921A (en) * 2015-03-26 2015-06-24 江苏物联网研究发展中心 Method for storing and processing small log type files in Hadoop distributed file system
CN105404691A (en) * 2015-12-14 2016-03-16 曙光信息产业股份有限公司 File storage method and apparatus

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102332027A (en) * 2011-10-15 2012-01-25 西安交通大学 A method for associative storage of massive non-independent small files based on Hadoop
CN102385623A (en) * 2011-10-25 2012-03-21 曙光信息产业(北京)有限公司 Catalogue access method in DFS (distributed file system)
CN102541985A (en) * 2011-10-25 2012-07-04 曙光信息产业(北京)有限公司 Organization method of client directory cache in distributed file system
US8825652B1 (en) * 2012-06-28 2014-09-02 Emc Corporation Small file aggregation in a parallel computing system
CN103020315A (en) * 2013-01-10 2013-04-03 中国人民解放军国防科学技术大学 Method for storing mass of small files on basis of master-slave distributed file system
CN103473287A (en) * 2013-08-30 2013-12-25 中国科学院信息工程研究所 Method and system for automatically distributing, running and updating executable programs
CN103577123A (en) * 2013-11-12 2014-02-12 河海大学 Small file optimization storage method based on HDFS
CN104731921A (en) * 2015-03-26 2015-06-24 江苏物联网研究发展中心 Method for storing and processing small log type files in Hadoop distributed file system
CN105404691A (en) * 2015-12-14 2016-03-16 曙光信息产业股份有限公司 File storage method and apparatus

Also Published As

Publication number Publication date
CN106570113A (en) 2017-04-19

Similar Documents

Publication Publication Date Title
CN106570113B (en) Mass vector slice data cloud storage method and system
US11258796B2 (en) Data processing unit with key value store
CN103020315B (en) A kind of mass small documents storage means based on master-salve distributed file system
CN110262922B (en) Erasure code updating method and system based on duplicate data log
US10275489B1 (en) Binary encoding-based optimizations at datastore accelerators
US8261020B2 (en) Cache enumeration and indexing
CN101556557B (en) Object file organization method based on object storage device
US20130218934A1 (en) Method for directory entries split and merge in distributed file system
CN105320773B (en) A kind of distributed data deduplication system and method based on Hadoop platform
US10356150B1 (en) Automated repartitioning of streaming data
US11287994B2 (en) Native key-value storage enabled distributed storage system
CN106990915B (en) Storage resource management method based on storage medium type and weighted quota
US20130191523A1 (en) Real-time analytics for large data sets
CN104881466B (en) The processing of data fragmentation and the delet method of garbage files and device
US9176867B2 (en) Hybrid DRAM-SSD memory system for a distributed database node
US10503693B1 (en) Method and system for parallel file operation in distributed data storage system with mixed types of storage media
US10310904B2 (en) Distributed technique for allocating long-lived jobs among worker processes
WO2011120791A1 (en) Transmission of map-reduce data based on a storage network or a storage network file system
US9389913B2 (en) Resource assignment for jobs in a system having a processing pipeline that satisfies a data freshness query constraint
US10146833B1 (en) Write-back techniques at datastore accelerators
CN103559229A (en) Small file management service (SFMS) system based on MapFile and use method thereof
Liu et al. An improved hadoop data load balancing algorithm
US9578120B1 (en) Messaging with key-value persistence
US10387384B1 (en) Method and system for semantic metadata compression in a two-tier storage system using copy-on-write
US9703788B1 (en) Distributed metadata in a high performance computing environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载