WO2013046667A1 - 情報システム、その管理方法およびプログラム、データ処理方法およびプログラム、ならびに、データ構造 - Google Patents
情報システム、その管理方法およびプログラム、データ処理方法およびプログラム、ならびに、データ構造 Download PDFInfo
- Publication number
- WO2013046667A1 WO2013046667A1 PCT/JP2012/006152 JP2012006152W WO2013046667A1 WO 2013046667 A1 WO2013046667 A1 WO 2013046667A1 JP 2012006152 W JP2012006152 W JP 2012006152W WO 2013046667 A1 WO2013046667 A1 WO 2013046667A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- range
- destination
- logical identifier
- nodes
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 60
- 238000003672 processing method Methods 0.000 title claims description 11
- 238000007726 management method Methods 0.000 claims description 61
- 238000006243 chemical reaction Methods 0.000 claims description 59
- 238000012545 processing Methods 0.000 claims description 46
- 230000001186 cumulative effect Effects 0.000 claims description 31
- 238000012546 transfer Methods 0.000 claims description 27
- 230000008569 process Effects 0.000 claims description 20
- 238000004364 calculation method Methods 0.000 claims description 12
- 238000005315 distribution function Methods 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 11
- 230000008859 change Effects 0.000 claims description 3
- 238000013500 data storage Methods 0.000 abstract description 48
- 230000006870 function Effects 0.000 description 30
- 238000012217 deletion Methods 0.000 description 29
- 230000037430 deletion Effects 0.000 description 29
- 238000007781 pre-processing Methods 0.000 description 19
- 238000010586 diagram Methods 0.000 description 17
- 239000002131 composite material Substances 0.000 description 7
- 238000004422 calculation algorithm Methods 0.000 description 5
- 230000007704 transition Effects 0.000 description 5
- 230000004044 response Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 229930091051 Arenine Natural products 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/104—Peer-to-peer [P2P] networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0629—Configuration or reconfiguration of storage systems
- G06F3/0635—Configuration or reconfiguration of storage systems by changing the path, e.g. traffic rerouting, path reconfiguration
Definitions
- the present invention relates to an information system, a management method and program thereof, a data processing method and program, and a data structure, and in particular, an information system for managing distributed data, a management method and program thereof, a data processing method and program, and data Concerning structure.
- Patent Document 1 discloses a distributed database system in which each record of data is divided and stored in a plurality of storage devices (first processors).
- first processors the range in which the key values of all the records of the table data constituting the data are distributed is divided into a plurality of sections.
- the number of records in each section is made equal, and a plurality of first processors are assigned to the plurality of sections.
- a central processor accesses the first processor.
- the key values of a plurality of records in each part of the database held by the first processor and information indicating the storage position of the records are transferred to the second processor to which the section of the key value to which the record belongs is assigned.
- the key value of the record held by them and the information indicating the storage position of the record are transferred to the first processor to which the section to which the key value belongs is assigned.
- the second processor sorts a plurality of transferred key values, and generates a key value table in which information indicating the storage location of the received record is registered together with the key values as a sorting result.
- the overlay management system described in Patent Document 2 is composed of space filling curve conversion processing means, distribution function processing means, and message transfer processing means.
- the overlay management system having such a configuration operates as follows. When registering or deleting data, the system selects a plurality of attributes (composite indexed attributes) designated in advance for efficient search. Then, the multi-dimensional value is acquired and made into a one-dimensional value by the space filling curve processing means, and this is inputted to the distribution function processing means to obtain a logical identifier as a uniformed one-dimensional value.
- attributes composite indexed attributes
- This logical identifier is used to determine the data storage destination and request information transfer destination.
- the message transfer processing means transmits the request information with the obtained logical identifier as the destination.
- the message transfer processing means transmits the message to the peer that bears the logical identifier, and registers or deletes the data in the peer.
- the distribution function is applied to the attribute value, and the attribute value data is stored using the logical identifier that is probabilistically uniformly distributed like the logical identifier assigned to the data storage destination node.
- the attribute value data is stored using the logical identifier that is probabilistically uniformly distributed like the logical identifier assigned to the data storage destination node.
- a range conditional expression of a plurality of attributes indexed with a composite index is acquired from the search expression, and the multidimensional range is converted into a plurality of one-dimensional values by a space filling curve processing means. Get the range.
- the distribution function processing means is executed to obtain a logical identifier, and this is performed for all of the plurality of one-dimensional values to obtain a plurality of logical identifier ranges.
- the message transfer processing means transmits a search request with the plurality of logical identifier ranges obtained in this way as destinations, and acquires data stored in a plurality of peers corresponding to the destinations.
- Non-Patent Document 2 extends to Chord that supports multi-dimensional attributes and range queries using multi-dimensional attributes in a P2P (Peer-to-Peer) system such as Distributed Hash Table (DHT).
- DHT Distributed Hash Table
- MAAN A Multi-Attribute Addressable Network Grid Information Services
- Chord is one of algorithms for realizing a distributed hash table.
- the P2P network is a technique for searching for contents at high speed without using a server and routing messages from one node to another.
- the distributed hash table is a technique in which access requests to a hash table are routed as a P2P network, among techniques managing a hash table with a plurality of peers.
- N nodes are divided by 1 / N in order to make the data amount strictly uniform, and then the number of nodes is increased by 1 and divided by 1 / (N + 1). In this case, data movement occurs in almost all nodes, and nodes that move almost all data appear. On the other hand, if data movement is performed with only one selected from the N units, data is stored unevenly, and only half of the data of other nodes is stored in a certain node.
- An object of the present invention is to solve the above-mentioned problems, an information system with less movement data when changing a data storage destination computer while keeping the load between nodes moderately uniform, its management method and program, data processing method and program As well as providing a data structure.
- the information system of the present invention is It has multiple nodes that distribute and manage data groups, The plurality of nodes each have a destination address identifiable on the network; Identifier assigning means for assigning a logical identifier to the plurality of nodes on a logical identifier space; Range determination means for associating the logical identifier space with the distribution of data in the data group, and determining a range of values of the data corresponding to the logical identifier of each of the nodes; When searching for the destination of the node of the storage destination of data of an attribute value or attribute range, based on the correspondence relationship between the range of the data value of each of the nodes, the logical identifier, and the destination address, Destination determination means for determining the logical identifier corresponding to the data range that matches at least part of the attribute value or the attribute range, and determining the destination address of the node corresponding to the logical identifier as the destination .
- the information system management method of the present invention includes: An information system management method for managing a plurality of nodes for managing data groups in a distributed manner, The plurality of nodes each have a destination address identifiable on the network;
- the information system includes a management device and a storage device,
- the management device is A logical identifier is assigned to the plurality of nodes on a logical identifier space, Correlating the logical identifier space with the distribution of data in the data group, determining a range of values of the data corresponding to the logical identifier of each of the nodes;
- searching for the destination of the node of the storage destination of data of an attribute value or attribute range based on the correspondence relationship between the range of the data value of each of the nodes, the logical identifier, and the destination address,
- the logical identifier corresponding to the data range that matches at least part of the attribute value or the attribute range is obtained, and the destination address of the node corresponding to the logical identifier is determined as the destination.
- the program of the present invention A computer program that implements a management device that manages a plurality of nodes that distribute and manage data groups, The plurality of nodes each have a destination address identifiable on the network;
- the management device has a storage device, In a computer that realizes the management device, A procedure for assigning a logical identifier to the plurality of nodes on a logical identifier space; Associating the logical identifier space with the distribution of data in the data group, and determining a range of values of the data corresponding to the logical identifier of each of the nodes; When searching for the destination of the node that is the storage destination of data of an attribute value or attribute range, the attribute value is based on the correspondence relationship between the data value range of each of the nodes, the logical identifier, and the destination address.
- the data processing method of the present invention includes: A data processing method of a terminal device connected to the management device of the management method of the information system and accessing the data via the management device,
- the terminal device is Notifying the management device of an access request to data having an attribute value or attribute range;
- Via the management device the access request based on a correspondence relationship between a destination address of the plurality of nodes, a logical identifier assigned to each node, and a value range of the data managed by each node
- the data is manipulated by accessing a destination of the node that manages the data in a range in which at least a part of the attribute value or the attribute range matches.
- the computer program of the present invention is: A computer program for realizing a client terminal connected to a server for managing a plurality of nodes for managing a data group in a distributed manner, The plurality of nodes each have a destination address identifiable on the network; In a computer that realizes the client terminal, A procedure for accepting an access request to data having an attribute value or attribute range; A procedure for notifying the server of the received access request; Based on the correspondence relationship between the destination address of the plurality of nodes, the logical identifier assigned to each node, and the value range of the data managed by each node, the attribute value requested for access or the Obtaining the logical identifier corresponding to the range of the data that matches at least a part of the attribute range, and receiving the destination address of the node corresponding to the logical identifier determined as the destination from the server; Accessing the node at the destination address received from the server to execute a procedure for operating the data in the attribute value or the attribute range.
- the data structure of the present invention is: A data structure of a destination table that is referred to when determining destinations of a plurality of nodes that manage data groups in a distributed manner, The plurality of nodes each have a destination address identifiable on the network;
- the destination table includes destination addresses of a plurality of nodes that manage the data group in a distributed manner, logical identifiers assigned to the nodes in a logical identifier space, and ranges of data values managed by the nodes. Including correspondence The range of data values of each node is associated with the logical identifier space and the distribution of data in the data group, and the range of data values corresponding to the logical identifier of each node is assigned to each node. Allocated.
- a plurality of components are formed as a single member, and a single component is formed of a plurality of members. It may be that a certain component is a part of another component, a part of a certain component overlaps with a part of another component, or the like.
- the plurality of procedures of the method and computer program of the present invention are not limited to being executed at different timings. For this reason, another procedure may occur during the execution of a certain procedure, or some or all of the execution timing of a certain procedure and the execution timing of another procedure may overlap.
- an information system capable of performing scalable data storage location management while maintaining a uniform load between nodes according to the data distribution of the data group, its management method and program, data processing method and program, and A data structure is provided.
- FIG. 1 is a functional block diagram showing a configuration of an information system 1 according to an embodiment of the present invention.
- An information system 1 according to an embodiment of the present invention includes a plurality of computers connected to each other via a network 3, for example, a plurality of schema management servers 102 (in FIG. 1, indicated as schema management servers A1 to An.
- n. Is a natural number, and may take different values.
- a plurality of data operation clients 104 shown as data operation clients B1 to Bn in FIG. 1
- a plurality of data storage servers 106 data in FIG. 1).
- Storage servers C1 to Cn) and a plurality of operation request relay servers 108 shown as operation request relay servers D1 to Dn in FIG. 1).
- An information system 1 includes a CPU (Central Processing Unit), a memory, a program that realizes the components shown in the figure loaded in the memory, a storage unit such as a hard disk that stores the program, and a network connection interface. It is realized by any combination of hardware and software of any computer provided. It will be understood by those skilled in the art that there are various modifications to the implementation method and apparatus. Each figure described below shows a functional unit block, not a hardware unit configuration. In addition, in each figure, it has abbreviate
- Each server and client constituting the information system 1 according to the present embodiment in FIG. 1 includes, for example, a CPU, a memory (or processor), a hard disk, and a communication device (not shown), an input device such as a keyboard and a mouse, a display, It can be realized by a server computer or a personal computer connected to an output device such as a printer, or a data processing device corresponding to them. Then, the CPU reads out a program stored in the hard disk to the memory and executes it, thereby realizing each function of each unit described later.
- each server and client constituting the information system 1 of the present embodiment may be a virtual computer such as a virtual machine, or a server group that provides services to users over a network such as a cloud. .
- the information system 1 of the present invention is a database that provides access functions to various application software by making data stored in different distributed computers into a table structure that allows at least one-dimensional attribute range search. Applicable to usage. Also, for messages and events sent to distributed computers, by specifying conditions related to the range of multi-dimensional attributes, it can also be used for message transmission and reception forms such as Publish / Subscribe that sets detection and notification of data generation. Applicable.
- the pre-stored range conditional expression is set as a 2D-dimensional attribute value.
- Data that is handled and registered may be treated as a 2D dimensional attribute range.
- the one-dimensional attribute range (25, 40) and the one-dimensional attribute range (35, 40) are stored as two-dimensional attribute values.
- the registered attribute value 30 searches for a two-dimensional range (( ⁇ , 30), (30, ⁇ )). As a result, (25, 40) is acquired as a range including this attribute value, and (35, 40) is not acquired. Notification is performed on the acquired result.
- this correspondence can be taken for the stream processing.
- At least one-dimensional attribute data is data having a plurality of different attributes.
- These data are assumed to be stored in a relational database that can be referenced and operated by a computer.
- a relational database there are rows (tuples) made up of a plurality of columns (attributes).
- an index is previously attached to a plurality of attribute pairs called composite indexes.
- the plurality of attributes include latitude and longitude, temperature and humidity, or the price of a product, manufacturer, model number, release date, and specifications.
- the information system 1 allows a client to access a shopping mall on a website and input a plurality of conditions, for example, a price range, a manufacturer, a release date, and the like to search for a product. It can be applied to usage scenes such as searching for.
- the information system 1 accepts the request, the information system 1 can search and extract data having attributes that meet the conditions from the relational database, and perform a process of returning the data to the client.
- a data search can be performed based on a plurality of search conditions (multidimensional) and a range-specified condition.
- a search request from a client to a website is generated at tens of thousands / second.
- a computer corresponding to at least one-dimensional attribute value is determined, or a plurality of at least one-dimensional attribute spaces such as range search are determined.
- the destination can be determined as follows. That is, when a correspondence between at least a one-dimensional attribute space and a computer is generated in advance from destination server information and data distribution, and a decision is made with reference to this correspondence, the number of attributes increases (for example, Even when dealing with attributes having a long bit length (for example, the INT type (32-bit length or more)), it is possible to determine the destination with a low processing load.
- the information system 1 includes a plurality of data computers 208 connected to each other via a network 3 and mainly responsible for storing data (in FIG. 2, data computers F1 to F1). Fn.) And an access computer 202 (indicated as access computers E1 to En in FIG. 2) that mainly issues an operation request for data are connected via a switch 206. Good.
- a metadata computer 204 that holds information (schema) related to the data structure stored in the data computer 208 may be added.
- the access computer 202 includes the data operation client 104 of FIG. 1, and the data computer 208 includes the data storage server 106 of FIG.
- the operation request relay server 108 in FIG. 1 may be provided in one or both of the access computer 202 and the data computer 208 in FIG. 2, but may not be provided in either.
- the schema management server 102 in FIG. 1 may be provided in the access computer 202 or the data computer 208 in FIG. 2, or may be provided in the metadata computer 204 in FIG.
- At least one peer computer 210 connected via the network 3 (in FIG. 3, indicated as peer computers G1 to Gn). May be provided.
- the peer computer 210 may uniformly include the schema management server 102, the data operation client 104, the data storage server 106, and the operation request relay server 108.
- FIG. 4 is a functional block diagram showing the configuration of the information system 1 of the present embodiment.
- the information system 1 according to the present embodiment includes a schema management server 102, a preprocessing unit 120, a destination resolution unit 340, an operation request unit 360, a relay unit 380, and a data storage server 106.
- the schema management server 102 and the preprocessing unit 120 are not connected to the network 3, but may be configured to be connected to the network 3.
- the schema management server 102 generates distribution information indicating the data distribution of the data group.
- the data of the data group stored in the plurality of nodes includes a set of data having an attribute value in a predetermined condition range or a set of data having a predetermined similar distribution. Based on this data distribution, the range of the attribute value of the data handled by each data storage server 106 is determined.
- the data operation client 104 in FIG. 1 includes the preprocessing unit 120, the destination resolution unit 340, and the operation request unit 360 in FIG.
- the operation request relay server 108 of FIG. 1 includes a preprocessing unit 120, a destination resolution unit 340, and a relay unit 380.
- FIG. 5 is a functional block diagram showing a main configuration of the information system 1 according to the present embodiment.
- the information system 1 of this embodiment includes a plurality of nodes (data storage server 106) that manage data groups in a distributed manner.
- Each of the plurality of nodes (data storage server 106 (FIG. 1)) has a destination address that can be identified on the network.
- the information system 1 includes an identifier providing unit (ID providing unit 112), a range determining unit 114, and a destination determining unit (destination solving unit 340).
- the ID assigning unit 112 assigns a logical identifier to the plurality of nodes (data storage server 106) on the logical identifier space.
- the range determination unit 114 associates the logical identifier space with the distribution of data in the data group, and determines a range of data values corresponding to the logical identifier of each node (data storage server 106).
- the range determination unit 114 uses the distribution information 116 generated by the schema management server 102. The generation of the distribution information 116 will be described in detail in an embodiment described later.
- the ID assigning unit 112 assigns each node to have a value in a finite ID (Identifier) space as a logical identifier ID (destination, address, or identifier).
- the ID assigning unit 112 determines the range in the ID space of the data handled by the node according to the ID.
- the ID of the node in charge of data can be obtained by using the hash value of the key of the data to be registered or acquired in DHT.
- a hash value of a random identifier or a unique identifier for example, an IP address and a port assigned in advance to the node can be used. As a result, load distribution can be achieved.
- As the ID space there are a ring type method, a HyperCube method, and the like. For example, Chord and Koorde use a ring-type ID space.
- Consistent Hashing an arbitrary natural number is m, and the ID space takes a one-dimensional [0, 2 m ), and each node i takes a value xi in the ID space as an ID.
- i is a natural number up to the number N of nodes, and is identified in the order of xi.
- the symbol “[” or the symbol “]” represents a closed section, and the symbol “(” or the symbol “)” represents an open section.
- the node i manages data included in [xi, x (i + 1)).
- the destination resolution unit 340 searches for a destination of a node (data storage server 106) that stores data of an attribute value or attribute range, a range of data values of each node (data storage server 106) and a logical identifier And a logical identifier corresponding to a data range in which at least a part of the attribute value or attribute range matches, based on the correspondence relationship with the destination address. Then, the destination resolution unit 340 determines the destination address of the node (data storage server 106) corresponding to the obtained logical identifier as the destination.
- the set of logical identifiers (hash values) assigned to each node by the ID assigning unit 112 is associated with the destination address (server IP address) of the destination node, and the destination server information table in FIG. 330 is stored.
- the logical identifier assigned to each node by the ID assigning unit 112 described above is used to determine a data storage destination and a message transfer destination. As described above, it is given to each node probabilistically and uniformly in a finite logical identifier space.
- a plurality of correspondences between the set of logical identifiers and the destination addresses are stored in the destination server information table 330 of FIG.
- the logical identifier is a hash value and an IP address of a destination computer.
- the range determination unit 114 has the attribute value space on the horizontal axis and the logical value on the vertical axis.
- the range of the attribute value space corresponding to the logical identifier assigned to each node can be determined.
- the node having the logical identifier 413 stores data in the range of the attribute values a4 to a5.
- the other end point is the end point (a4) of the adjacent node (the node of the logical identifier 250).
- the correspondence between the ID and the attribute value range is determined and stored in the correspondence storage 118 as shown in FIG.
- the correspondence relationship in FIG. 7B has a data structure of a destination table that is referenced when determining the destinations of a plurality of nodes that manage data groups in a distributed manner. That is, the node IP address can be included as the node destination information.
- This destination table is a correspondence relationship between destinations of a plurality of nodes that manage data groups in a distributed manner, logical identifiers assigned to each node on a logical identifier space, and ranges of data values managed by the respective nodes. including.
- the data value range of each node is associated with the logical identifier space and the distribution of data in the data group, and the data value range corresponding to the logical identifier of each node is assigned to each node.
- the attribute value range is determined according to the logical identifier, and as a result, the attribute value is determined.
- a data group having a distribution based on is assigned to each node in a probabilistic and uniform manner.
- each node has a data amount of 1 / node number, but it is not necessary to guarantee that it has a data amount of 1 / node number strictly.
- the load of each node is allocated uniformly in terms of probability according to the data distribution.
- the management method of the information system 1 of this embodiment is demonstrated below. 8 and 9 are flowcharts showing the operation of the information system 1 of the present embodiment.
- the ID assigning unit 112 assigns a logical identifier to a plurality of nodes in the logical identifier space.
- the range determination unit 114 associates the logical identifier space with the distribution of data in the data group, and the range of the data value corresponding to the logical identifier of each node. (Step S13 in FIG.
- the destination resolution unit 340 (FIG. 5) Based on the correspondence between the range of data values of each node, the logical identifier, and the destination address, a logical identifier corresponding to the range of data that matches at least part of the attribute value or attribute range is obtained. Determining the destination address of the node corresponding to the identifier as a destination (step S23 in FIG. 9).
- the computer program according to the embodiment of the present invention provides a procedure for assigning a logical identifier in a logical identifier space to a plurality of nodes to a computer that implements the data operation client 104 or the operation request relay server 108 of FIG.
- searching for a destination a logical identifier corresponding to a range of data in which at least part of the attribute value or attribute range matches based on the correspondence between the range of the data value of each node, the logical identifier, and the destination address
- a procedure for determining a destination address of a node corresponding to the logical identifier as a destination There.
- the computer program of this embodiment may be recorded on a computer-readable recording medium.
- the recording medium is not particularly limited, and various forms can be considered.
- the program may be loaded from a recording medium into a computer memory, or downloaded to a computer through a network and loaded into the memory.
- the ID assigning unit 112 assigns a logical identifier to the plurality of nodes on the logical identifier space (step S11 in FIG. 8). Then, the range determination unit 114 associates the logical identifier space with the distribution of data in the data group, and determines the range of data values corresponding to the logical identifier of each node (step S13 in FIG. 8).
- the ID assigning unit 112 assigns a logical identifier to the new node on the logical identifier space (step S11 in FIG. 8), and the range determining unit 114 The range of the data value corresponding to the logical identifier of the node is changed between the node added to the node adjacent to the node (not shown).
- the range determination unit 114 performs a range of data values corresponding to the logical identifier of the node between the deleted node and another node adjacent to the deleted node (logical node adjacent to the logical identifier). (Not shown).
- the ID assigning unit 112 assigns to a new node, even if the existing node group is stochastically uniform, there are nodes having a wide logical identifier with adjacent nodes and narrow nodes. .
- a wide node has a lot of data, and a narrow node has a little data.
- the logical identifier given to a newly added node is highly likely to enter a space that is wide with an adjacent node, and is unlikely to enter a narrow space. Therefore, the range determined by the range determination unit 114 from this logical identifier and distribution information is the effect of receiving data from a node having more data than other nodes, that is, it is possible to reduce the load from a high load node and make it uniform Increases nature.
- the information system 1 of the present invention when nodes are added or deleted, it is not necessary to move the data of all the nodes, and only some of the nodes (nodes adjacent to the target node) move the data. In addition, the stochastic uniformity can be maintained. When one physical node has a plurality of logical identifiers, it is necessary to perform data movement with other nodes corresponding to the number of logical identifiers.
- the destination resolution unit 340 A logical identifier corresponding to a data range in which at least a part of the attribute value or attribute range matches is obtained based on the correspondence between the data value range of each node, the logical identifier, and the destination address.
- the destination address of the corresponding node is determined as the destination (step S23 in FIG. 9).
- scalable data storage destination management can be performed while keeping the load between nodes uniform according to the data distribution of the data group.
- the reason is that the range of data values managed by each node is not determined so that the number of records is uniform, but is determined according to the data distribution using a logical identifier obtained from a random or node identifier hash value. It is because it decides. For example, even if a node is added or deleted, it is not necessary to change the range of data handled by all the nodes. If the range of data values managed between adjacent nodes of the added or deleted node is changed, It will be good.
- the information system 1 of this embodiment differs from the above embodiment in that the same applies to multidimensional attribute data by performing space filling curve change processing on multidimensional attribute data to obtain data distribution information based on attribute values. Is different in that the destination can be determined.
- the preprocessing unit 120 FIGS. 4 and 5
- the preprocessing unit 320 FIG. 4 and 5
- FIG. 10 is a functional block diagram showing the configuration of the schema management server 102 of the information system 1 according to this embodiment.
- the data group can include data having multidimensional attributes.
- the information system 1 includes a space-filling curve one-dimensionalization unit 304 that performs a space-filling curve conversion process to convert a multidimensional attribute value included in data based on a predetermined attribute value from the data group into one-dimensional data, and a space A distribution calculation unit 308 that calculates a cumulative distribution of values one-dimensionalized by the filling curve one-dimensionalization unit 304. Then, the preprocessing unit 320 described later performs processing using the cumulative distribution calculated by the distribution calculation unit 308 as distribution information.
- FIG. 12 is a functional block diagram illustrating a configuration of the preprocessing unit 320 of the information system 1 according to the present embodiment.
- the information system 1 obtains a distribution function that represents the distribution of data in a data group, receives the logical identifier of each node, performs an inverse function of the distribution function, and outputs a one-dimensional value.
- a space-filling curve multidimensionalization unit space-filling curve server conversion unit 326) that converts a one-dimensional value into a multidimensional value by a space-filling curve conversion process.
- a set of one-dimensional values generated by applying an inverse function to the set of logical identifiers of the nodes by the inverse function unit 324 is converted into multidimensional values by the space filling curve server conversion unit 326, and the obtained multiple The dimension value, the logical identifier, and the destination address are associated with each other and held as a correspondence relationship.
- the schema management server 102 includes a sample data storage unit 302, a space filling curve one-dimensionalization unit 304, a sample data one-dimensional value storage unit 306, and a distribution calculation unit 308.
- a distribution storage unit 310 and generates distribution information obtained by making multi-dimensional attribute data one-dimensional.
- sample data storage unit 302 a part of multi-dimensional attribute data stored in the distributed system or a set of data having similar distribution information is given and stored in advance.
- the sample data one-dimensional value storage unit 306 stores a value obtained by converting sample multidimensional attribute data into a one-dimensional value.
- the distribution storage unit 310 stores one-dimensional cumulative distribution information having the same distribution information as a part of multidimensional attribute data stored in the distributed system or a set of data whose distribution information is similar to each other. Is done.
- the space filling curve one-dimensionalization unit 304 converts the value of the multi-dimensional attribute into a one-dimensional value according to a predetermined type of the space filling curve.
- the types of space filling curves include Hilbert space filling curves and Z curve space filling curves.
- the conversion includes a method using a conversion rule table.
- FIG. 11 shows a block diagram and a state transition diagram of a space filling curve conversion rule in the information system 1 of the present embodiment.
- the conversion rule of the Hilbert space filling curve is shown as a type of the space filling curve, another Z curve space filling curve or the like may be used. In this case, the conversion rule is different from that in FIG.
- the conversion rule in FIG. 11 shows a rule in the case of two dimensions. The upper part of the conversion rule indicates a multi-dimensional value of a specific bit, and the lower part indicates a corresponding one-dimensional value.
- the four conversion rules are referred to as a conversion rule table, and the conversion rule table is a conversion rule table state ( 0, 1, 2, 3).
- the conversion rule table is a conversion rule table state ( 0, 1, 2, 3).
- the multi-dimensional value of the next bit is given as an input, and the corresponding one-dimensional value is obtained.
- a value obtained by connecting the bits of the one-dimensional value obtained by repeating the state transition in order from the first bit is output from the space filling curve one-dimensionalization unit 304.
- the one-dimensional value output from the space filling curve one-dimensionalization unit 304 (FIG. 10) is stored in the sample data one-dimensional value storage unit 306 (FIG. 10).
- the distribution calculation unit 308 receives a set of one-dimensional values as input, and calculates density distribution information and cumulative distribution information of the data in a format such as a histogram or a cumulative histogram.
- a histogram representing density distribution information
- a one-dimensional value may be divided into a certain width, the data existing in that width may be counted, and the amount thereof may be used as a distribution amount.
- the width is not constant and differs for each partition, and the histogram may be expressed as a set of pairs of distribution width and distribution amount.
- the cumulative histogram is obtained by converting it into a cumulative histogram that takes a cumulative value in a direction in which the one-dimensional value monotonously increases.
- the one-dimensional cumulative distribution information calculated by the distribution calculation unit 308 is stored in the distribution storage unit 310.
- FIG. 12 is a functional block diagram illustrating a configuration of the preprocessing unit 320 of the information system 1 according to the present embodiment.
- a destination server storage unit destination server information storage unit 322 that stores a destination server table in which a set (range) of logical identifiers and corresponding destination addresses are associated with each other, and distribution information
- An inverse function unit 324 that performs an inverse function of a distribution function using, a space-filling curve multidimensionalization unit (space-filling curve server conversion unit 326) that converts a one-dimensional value into a multidimensional value by a space-filling curve conversion process,
- the inverse function unit 324 applies an inverse function to the set of logical identifiers (hash values) assigned to each computer (so that the distribution is statistically uniform) by referring to the destination server table.
- the set of one-dimensional values generated in this way is converted into multi-dimensional values by the space filling curve multi-dimensionalization unit (space filling curve server conversion unit 326) and associated with the destination address in advance.
- Information stored in the table space-filling curve server information table 332 of the space-filling curve server information storage unit 328 (FIG. 13)).
- the preprocessing unit 320 includes a destination server information storage unit 322, an inverse function unit 324, a space filling curve server conversion unit 326, and a space filling curve server information storage unit 328. And has a function of creating space filling curve server information.
- the destination server information storage unit 322 stores a plurality of correspondences between the set of logical identifiers for determining the data storage destination and the message transfer destination described above and the destination address of the node. For example, in the case of Consistent Hashing or a distributed hash table, the hash value and the IP address of the destination node are stored in the destination server information storage unit 322. The destination server information storage unit 322 can be provided for each node.
- the set of node logical identifiers is changed, and the corresponding relationship (destination server information table 330 in FIG. 6) is changed accordingly.
- an update unit (not shown) for updating the space filling curve server information table 332 in FIG. 13 to be described later.
- the space filling curve server information storage unit 328 stores a plurality of destination addresses of other computers for the partial space of the multidimensional attribute space.
- the subspace of the multidimensional attribute space may be expressed by enumerating one-dimensional values of the origin of the multidimensional attribute space and enumerating and expressing the union of attribute ranges for the number of dimensions.
- a union of conditions such as the value of which bit in which dimension may be enumerated and expressed.
- the space filling curve server information storage unit 328 is a value in which the origin of the range (attribute space) of the logical identifier (ID) corresponding to the destination address (IP) is expressed in one dimension. Is associated with the destination address and stored as a space filling curve server information table 332.
- the space-filling curve server information table 332 includes both the logical identifier (ID) and the destination address (IP). However, for example, the logical identifier (ID) may not be included.
- the space filling curve server information table 332 displays either the logical identifier (ID) or the destination address (IP). Include it.
- the space-filling curve server conversion unit 326 (FIG. 12) converts the one-dimensional value into a multidimensional value by the space-filling curve conversion processing, and the space-filling curve server information table as a multidimensional value instead of the one-dimensional value.
- the space-filling curve server information table 332 when stored as a one-dimensional value in the space-filling curve server information table 332, when referring to this, it is necessary to refer to the given multi-dimensional attribute value or multi-dimensional attribute range while performing processing using the space-filling curve. There is.
- the space filling curve server information table 332 when it is stored as a multidimensional value in the space filling curve server information table 332, when referring to this, processing by the space filling curve is not necessary.
- the multidimensional attribute range of each node as shown in the multidimensional attribute destination table 333 of FIG. 24 is converted into a table format and stored in the space filling curve server information storage unit 328 as the space filling curve server information table 332.
- a one-dimensional value is output with respect to the input value so as to correspond to the value obtained by applying ().
- the cumulative distribution ratio of this section i is r [i]
- the one-dimensional value is v [i].
- the space-filling curve server conversion unit 326 receives the one-dimensional value for each destination server calculated by the inverse function unit 324 as an input, and converts it into a multidimensional value by space-filling curve conversion processing. Further, the space filling curve server conversion unit 326 has a space in which a one-dimensional value for each server is determined in advance according to the above-described format of the space filling curve server information table 332 stored in the space filling curve server information storage unit 328. It converts into the format of the filling curve server information, creates the space filling curve server information table 332, and stores it in the space filling curve server information storage unit 328. The format conversion is not performed, and information including a pair of the address of each server and the one-dimensional value obtained by the inverse function unit 324 may be used.
- FIG. 14 is a functional block diagram showing a main configuration of the information system 1 according to the present embodiment.
- the information system 1 of the present embodiment further includes an operation request unit that receives an attribute value corresponding to data for which an operation request has been received, together with an operation request for data processing, for a data group that is distributed and stored in a plurality of computers 360 and a transfer unit (relay unit 380 or operation request unit 360) that transfers the received operation request to the destination address determined by the determination unit (space filling curve server determination unit 346).
- the curve server determination unit 346) determines a destination address based on the attribute value received by the operation request unit 360, and transfers it to the relay unit 380 (or the operation request unit 360).
- the destination resolution unit 340 includes a single destination resolution unit 342, a range destination resolution unit 344, and a space filling curve server determination unit 346.
- the destination resolution unit 340 is configured to include both the single destination resolution unit 342 and the range destination resolution unit 344, but is not particularly limited, and may be either one.
- the operation request unit 360 includes a data addition / deletion unit 362 and a data search unit 364.
- the data storage server 106 includes a data storage unit 390.
- the single destination resolution unit 342 receives the value of the multidimensional attribute of the given data as an input, and acquires the destination address of the destination computer to which an operation request regarding the data is to be transmitted.
- the range destination resolution unit 344 receives a given multi-dimensional attribute range as input, and acquires a plurality of destination addresses of destination computers to which operation requests relating to the data are to be transmitted.
- the space filling curve server determination unit 346 acquires the space filling curve server information stored in the space filling curve server information storage unit 328. Then, the space filling curve server determination unit 346 refers to the value of the multidimensional attribute value or the range of the multidimensional attribute notified from the single destination resolution unit 342 or the range destination resolution unit 344 while referring to the space filling curve server information. The corresponding one or more computer destinations are returned to the single destination resolution unit 342 or the range destination resolution unit 344, respectively.
- the data addition / deletion unit 362 (operation request unit 360 of the data operation client 104 in FIG. 1) provides the user with an additional data deletion / deletion operation service to an external application program or the like. Furthermore, when the application program is executed by the user and an operation for adding or deleting data is requested, the data addition / deletion unit 362 operates on a plurality of attributes determined to be indexed in advance with respect to data to be subjected to the operation request. Get the value specified in the request. Then, the data addition / deletion unit 362 acquires from the destination resolution unit 340 the address of the destination computer to which the operation request regarding the multidimensional attribute value is to be transmitted. Further, the data addition / deletion unit 362 transfers the operation to the acquired computer having the destination address.
- the application program is, for example, a web application and an application program for various shopping sites.
- the data search unit 364 (the operation request unit 360 of the data operation client 104 in FIG. 1) provides a data search service to an external application program or the like. When this data search process is executed, the data search unit 364 acquires a range of a plurality of attributes determined to be indexed in advance from the search expression specified in the search request. Then, the data search unit 364 acquires a plurality of addresses of destination computers to which an operation request relating to the multidimensional attribute range should be transmitted. Then, the data search unit 364 transfers the operation to the respective computers. When the data addition / deletion unit 362 of the computer (data storage server 106) to execute the operation receives the operation, the data search processing is performed on the corresponding data storage unit 390, and the data search result obtained as a result is displayed. Return it to the program that called the service.
- the operation request unit 360 includes both the data addition / deletion unit 362 and the data search unit 364, but is not particularly limited, and may be either one. Further, a data processing unit other than the data addition / deletion unit 362 or the data search unit 364 may be provided.
- the data processing unit may specify a condition and accept a request such as a search for a plurality of data sets or an update process for specifying a condition, and perform processing.
- the information system 1 of the present invention includes a space filling curve server information storage unit 328 that stores at least a space filling curve server information table 332, a space filling curve server determination unit 346, and data to be processed by a user.
- An operation request receiving unit (not shown) that receives an operation request including an attribute value (including an attribute space) may be provided.
- the relay unit 380 has a function of accepting an operation request transferred from the operation request unit 360 or the relay unit 380 of another computer and transferring the operation request to another computer.
- the transfer destination is determined by making an inquiry to the destination resolution unit 340 existing in the same computer as the relay unit 380 based on the attribute value included in the received operation request and the search condition for the attribute. .
- Data stored in the distributed system is stored in the data storage unit 390, and data is read and written according to external data write and read requests.
- the information system management method of the present embodiment is further configured so that, in the schema management server 102 (FIG. 10), the space filling curve one-dimensionalization unit 304 (FIG. 10) The multidimensional attribute value included in the data based on the determined attribute value is subjected to space filling curve conversion processing to be one-dimensional, and the distribution calculation unit 308 (FIG. 10) calculates the cumulative distribution of the one-dimensional value.
- the preprocessing unit 320 (FIG. 12) associates the cumulative distribution calculated by the distribution calculation unit 308 (FIG. 10) with the logical identifier space as the data distribution.
- the inverse function unit 324 obtains a distribution function representing the distribution information, and the logical identifier of each node Is input, an inverse function of the distribution function is performed, and a one-dimensional value is output, and the space-filling curve server conversion unit 326 (FIG. 12) converts the one-dimensional value into a multidimensional value by space-filling curve conversion processing.
- the multidimensional value, the logical identifier, and the destination address are associated with each other and stored as a correspondence (the space filling curve server information table 332 in FIG. 13).
- the result output by the inverse function unit 324 holds the logical identifier and the destination address in association with each other as the correspondence (the space filling curve server information table 332 in FIG. 13).
- the space-filling curve server conversion unit 326 (FIG. 12) converts the one-dimensional value into a multidimensional value by the space-filling curve conversion process, and the correspondence relationship (the space in FIG. You may store in the filling curve server information table 332).
- FIG. 15 is a flowchart illustrating an example of processing (step S101) for generating a one-dimensional multidimensional distribution in the schema management server 102 of the information system 1 according to the present embodiment.
- step S101 processing for generating a one-dimensional multidimensional distribution in the schema management server 102 of the information system 1 according to the present embodiment.
- the schema management server 102 repeatedly executes the following steps S103 to S107 for each of the multidimensional data stored in the sample data storage unit 302 (step S103). Then, the space filling curve one-dimensionalization unit 304 refers to the sample data storage unit 302 and performs one-dimensionalization of multidimensional data (step S105). The one-dimensional value obtained in step S105 is stored in the sample data one-dimensional value storage unit 306 (step S107).
- the distribution calculation unit 308 then derives cumulative distribution information from the data stored in the sample data one-dimensional value storage unit 306. And stored in the distribution storage unit 310 (step S109).
- FIG. 16 is a flowchart illustrating an example of a process (step S201) of generating space filling curve server information in the preprocessing unit 320 of the information system 1 of the present embodiment.
- step S201 a process of generating space filling curve server information in the preprocessing unit 320 of the information system 1 of the present embodiment.
- the pre-processing unit 320 (FIG. 12) repeatedly executes the following step S205 and step S207 for each destination server information stored in the destination server information storage unit 322 (FIG. 12) (step S203).
- the inverse function unit 324 (FIG. 12) normalizes the destination logical identifier and applies an inverse function thereto to obtain a one-dimensional value (step S205).
- the inverse function unit 324 stores this in the space filling curve server information storage unit 328 (FIG. 12) as the space filling curve server information table 332 of FIG. 13 (step S207).
- the space-filling curve server information obtained by the space-filling curve server conversion unit 326 (FIG. 12) using the one-dimensional value obtained in step S205 as a multi-dimensional attribute value and processing this for all server information. Is stored in the space filling curve server information storage unit 328 (FIG. 12) (step S207).
- FIGS. 17 and 18 are flowcharts illustrating examples of operations of destination determination processing (step S301) and a plurality of destination determination processing (step S401) of the destination resolution unit 340 in response to an operation request in the information system 1 of the present embodiment. is there.
- a data processing method is a data processing method for a client terminal (a terminal (not shown) receiving a service provided by an external application program) connected to a server that manages a plurality of nodes that distribute and manage data groups.
- the client terminal notifies the management device (the data operation client 104 or the operation request relay server 108 in FIG. 4) of an access request to the data having the attribute value or the attribute range, and a plurality of items are transmitted via the management device.
- the destination address of the node data storage server 106 in FIG.
- the single destination resolving unit 342 (FIG. 14) inputs a multi-dimensional attribute value from the data addition / deletion unit 362 (FIG. 14), and transfers it to the space filling curve server determination unit 346 (FIG. 14) (step S303). ).
- the space filling curve server determination unit 346 (FIG. 14) acquires the space filling curve server information table 332 (FIG. 13) stored in the space filling curve server information storage unit 328 (FIG. 14). Then, the space filling curve server determination unit 346 acquires the destination (IP address) of one computer (server) corresponding to the value of the multidimensional attribute value while referring to the space filling curve server information table 332, It returns to the destination resolution unit 342 (FIG. 14) (step S305).
- the single destination resolution unit 342 acquires the destination determined by the space filling curve server determination unit 346 (FIG. 14), and the relay unit 380 sends another destination address to another computer.
- the operation request is transferred via the network 3 (FIG. 14) (step S307).
- the data addition / deletion unit 362 performs data addition / deletion operations on the data storage unit 390 (FIG. 14) of the data storage server 106 (FIG. 14) according to the operation request (FIG. 14). Step S309).
- the data addition / deletion unit 362 sends the operation result to the program that called the service (for example, the data operation client 104 of FIG.
- the single destination resolution unit 342 (FIG. 14) of the destination resolution unit 340 (FIG. 14) determines the value of the multidimensional attribute included in the operation request. The destination is determined based on the above.
- the data search unit 364 (FIG. 14) indexes in advance from the search formula designated in the search request.
- the plurality of attribute ranges are acquired via the network 3 and notified to the range destination resolution unit 344 (FIG. 14).
- the range destination resolving unit 344 inputs a multidimensional attribute range from the data search unit 364 (FIG. 14), and transfers it to the space filling curve server determination unit 346 (FIG. 14) (step S403).
- the space filling curve server determination unit 346 acquires the space filling curve server information table 332 (FIG. 13) stored in the space filling curve server information storage unit 328 (FIG. 14).
- the space filling curve server determination unit 346 acquires destinations (IP addresses) of a plurality of computers (servers) corresponding to the range of the multidimensional attribute value while referring to the space filling curve server information table 332, and the range destination It returns to the solution part 344 (FIG. 14) (step S405).
- the range destination resolution unit 344 acquires a plurality of destinations determined by the space filling curve server determination unit 346 (FIG. 14), and relays them to other computers of the plurality of destination addresses.
- the unit 380 transfers the operation request via the network 3 (FIG. 14) (step S407).
- the data search unit 364 searches the data storage unit 390 (FIG. 14) of the data storage server 106 (FIG. 14) according to the operation request (step S409). Then, the data search unit 364 (FIG. 14) returns the search result to the program that called the service (for example, the data operation client 104 executing the program) via the network 3 (FIG. 14) ( Step S411).
- the range destination resolution unit 344 (FIG. 14) of the destination resolution unit 340 (FIG. 14) determines the range of the multidimensional attribute included in the operation request. First, the destination (IP address) of the transfer destination is determined.
- CREATE INDEX geo_idx ON user longitude, latitude
- CREATE ⁇ ⁇ ⁇ TABLE user char name, number age, number longitude, 10.1
- the command will index the two-dimensional attributes longitude and latitude, and there will be a registration request INSERT INTO user (name, age, longitude, ...) VALUES (hoge, 20,35.3 ..., 7)
- values related to user.name can be obtained from the range of latitude and longitude during SELECT name FROM user WHERE user.age> 20 and user.longitude ....
- the data search unit 364 (FIG. 14) issues a registration request of INSERT INTO user (name, age, longitude, ...) VALUES (hoge, 20,35.3 ..., ).
- the range destination resolution unit 344 receives the value related to user.name from the range of SELECT name FROM user WHERE user.age> 20 and user.longitude ... latitude and longitude.
- the distribution information is generated for the data of the multidimensional attribute value, and the data of the multidimensional attribute value is statistically and uniformly based on the distribution information.
- the destination information of the computer in charge of the data for the attribute value or the attribute subspace is executed in the following procedure before the operations such as data registration, deletion, and search are executed.
- a dimension value is calculated, and the given one-dimensional value is input, and a multi-dimensional value is output by the space-filling curve server conversion unit 326 (FIG. 12), and a space-filling curve is obtained from the pair of the multi-dimensional value and the destination server.
- the server information storage unit 328 (FIG. 12) can store the attribute information or the destination information for the attribute subspace.
- the attribute information or the destination information for the attribute subspace is acquired from the space filling curve server information storage unit 328 (FIG. 12), and the given attribute value or attribute is obtained.
- Corresponding destination information can be acquired from the conditions.
- the information system 1 of the present embodiment even when operations such as registration, deletion, and search are performed on data, even if the number of attributes (dimensions) that have been composite indexed is large. Thus, it is possible to speed up the process of determining the destination to which the operation request information is transferred from the attribute value of the data or the condition for the attribute value. The reason is that when registering, deleting, or retrieving data, it is not necessary to perform processing for converting multi-dimensional attribute values or attribute conditions into one-dimensional values or ranges.
- the data indexed in the composite index is used when determining the destination to which the operation request information is transferred from the data attribute value or attribute condition.
- the bit length of the data becomes longer, the calculation time required for the determination increases, and the performance such as the response time of the operation deteriorates.
- the reason is that, in the process of converting the composite indexed attribute value into a one-dimensional value by the space filling curve processing means, the longer the bit length, the longer the time required for conversion.
- the time required for conversion increases.
- a destination to which request information of the operation is transferred is determined from the attribute value of the data or the condition for the attribute value.
- the number of attributes (dimensions) indexed in a compound index increases, there is a problem that the calculation time required for the determination increases and the performance such as the response time of the operation decreases.
- the data attributes there is an effect that it is possible to speed up the process of determining the destination to which the operation request information is transferred from the condition for the value or the attribute value.
- the reason is that when registering, deleting, or retrieving data, it is not necessary to perform processing for converting multi-dimensional attribute values or attribute conditions into one-dimensional values or ranges.
- FIG. 2 an example in which data stored in a plurality of data computers 208 is operated from the access computer 202 is shown.
- 2 is the data operation client 104 of FIG. 1
- the metadata computer 204 of FIG. 2 is the schema management server 102 of FIG. 1
- the data computer 208 of FIG. Data storage server 106 exists.
- the data distribution 1001 of FIG. 19 is stored in the sample data storage unit 302 of the schema management server 102 of FIG. 10 in the metadata computer 204 of FIG.
- the space filling curve one-dimensionalization unit 304 in FIG. 10 performs each process shown in the data distribution 1001 in FIG. One-dimensionalization is performed from the multidimensional attribute value of the data, and each is stored in the sample data one-dimensional value storage unit 306 in FIG.
- the distribution calculation unit 308 in FIG. 10 calculates the cumulative distribution information from the stored one-dimensional value in a format such as a cumulative histogram, and stores it in the distribution storage unit 310 in FIG.
- a histogram is obtained as the density distribution information 1003 shown in FIG.
- a table 1005 having the distribution width and the distribution amount shown in FIG.
- a cumulative distribution ratio obtained by converting the density distribution into a cumulative distribution and dividing the distribution amount of each section by the sum of the distribution amounts is shown in a table 1015 of FIG. 21B, which is the cumulative distribution of FIG. Corresponds to information (cumulative histogram) 1013.
- the distribution amount slope shown as “section slope” in the figure
- the distribution amount slope shown as “section slope” in the figure
- the slope of the distribution amount in Table 1025 (v [i]-v [i-1]) / (r [i]-r [i-1] in (Equation 1) described in the above embodiment. ) Need not be calculated every time.
- a value obtained by inputting this server IP address into a hash function such as SHA (Secure Hash Algorithm) 1 or MD5 (Message Digest Algorithm 5) is calculated as a logical identifier of the server by the ID assigning unit 112, and is shown in FIG. It is stored in the same destination server information storage unit 322.
- the logical identifiers are distributed in a range of [0, 2 b ), where the logical identifier space size determined by the hash function is 2 b .
- symbol [” or “symbol” ” represents a closed section
- “ symbol (”or“ symbol ”) represents an open section
- the logical identifier space 1100 is shown in a ring shape, and the logical identifier 1102 arranged on this circle represents a computer.
- a value obtained by dividing the logical identifier by the logical identifier space size is referred to as a normalized logical identifier. This is distributed in the range [0, 1).
- Each computer is assigned to the logical identifier space 1100 in a stochastic and uniform manner independent of the distribution of attribute values.
- the normalized logical identifier of each server stored in the destination server information table 330 of FIG. Is converted into a one-dimensional value by the inverse function unit 324 (FIG. 12).
- the inverse function unit 324 refers to the cumulative distribution information of the distribution storage unit 310 (FIG. 10) in the schema management server 102 (FIG. 10).
- the cumulative histogram table 1015 FIG. 21B as a procedure for calculating the inverse function shown here, when 0.35 is given as an input normalization logical identifier, 0.13 is returned. It is.
- the space filling curve server conversion unit 326 uses the binary representation of the one-dimensional value and the IP address information of each server as a space filling curve server information table 332 as shown in FIG. It is stored in the filling curve server information storage unit 328 (FIG. 12).
- the space filling curve server conversion unit 326 performs only formal conversion. In the example of FIG. 25, the one-dimensional value is held not at the starting point of the range but at the end of the range.
- the data addition / deletion unit 362 receives the data registration request, and the single destination resolution unit 342 (FIG. 14) corresponds to the multidimensional attribute value indexed from this data. Determine the destination.
- a two-dimensional attribute value is taken as an example, and this value is (3, 4), that is, (011, 100) in binary notation.
- the space filling curve server determination unit 346 (FIG. 14) extracts the first bit of each dimension and obtains the first multidimensional bit (01). Assume that the initial conversion rule table state is 0. From the state 0 conversion rule, the first one-dimensional bit (01) is output as an output.
- the space filling curve server information is referred to, and the pointer is moved to the value range end point 010111 (27) in which the bit pattern of the value range end point starts from the one-dimensional bit 01.
- the conversion rule table state is 0, so the same table is used without making a transition to another table.
- the second multidimensional bit (10) is obtained as the next bit.
- a second one-dimensional bit (11) is output as an output from the conversion rule, and this is added to the previous bit string to obtain a one-dimensional bit (0111).
- the pointer is moved to the obtained range end point 011101 (29) starting from 0111. Since the conversion rule state of the transition destination corresponding to the second multidimensional bit (10) is 2, this conversion rule table is acquired.
- the third multidimensional bit (11) is taken out as the next bit, and the third one-dimensional (00) is output in the conversion rule table in the state 2, and this is also added to the previous bit string and the one-dimensional bit (011100). 28 is obtained as a decimal number.
- a node that manages this as a value range has a logical identifier 551, and a node having an IP of 10.1.1.5 is selected from the space filling curve server information table 332 shown in FIG. In this way, the destination can be determined.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
メッセ―ジ転送処理手段は、このようにして得られた複数の論理識別子範囲を宛先として、検索要求を送信し、その宛先と対応する複数のピアに格納されたデータを取得する。
データ群を分散して管理する複数のノードを備え、
複数の前記ノードは、それぞれネットワーク上で識別可能な宛先アドレスを有し、
複数の前記ノードに対し、論理識別子空間上で論理識別子を付与する識別子付与手段と、
前記論理識別子空間と、前記データ群におけるデータの分布と、を対応付け、各前記ノードの前記論理識別子に対応する前記データの値の範囲を決定する範囲決定手段と、
ある属性値または属性範囲のデータの格納先の前記ノードの宛先を探索するとき、各前記ノードの前記データの値の前記範囲と、前記論理識別子と、前記宛先アドレスとの対応関係に基づき、前記属性値または前記属性範囲の少なくとも一部が一致する前記データの範囲に対応する前記論理識別子を求め、当該論理識別子に対応する前記ノードの宛先アドレスを前記宛先として決定する宛先決定手段と、を備える。
データ群を分散して管理する複数のノードを管理する情報システムの管理方法であって、
複数の前記ノードは、それぞれネットワーク上で識別可能な宛先アドレスを有し、
前記情報システムは、管理装置と、記憶装置と、を有し、
前記管理装置が、
複数の前記ノードに対し、論理識別子空間上で論理識別子を付与し、
前記論理識別子空間と、前記データ群におけるデータの分布と、を対応付け、各前記ノードの前記論理識別子に対応する前記データの値の範囲を決定し、
ある属性値または属性範囲のデータの格納先の前記ノードの宛先を探索するとき、各前記ノードの前記データの値の前記範囲と、前記論理識別子と、前記宛先アドレスとの対応関係に基づき、前記属性値または前記属性範囲の少なくとも一部が一致する前記データの範囲に対応する前記論理識別子を求め、当該論理識別子に対応する前記ノードの宛先アドレスを前記宛先として決定する。
データ群を分散して管理する複数のノードを管理する管理装置を実現するコンピュータのプログラムであって、
複数の前記ノードは、それぞれネットワーク上で識別可能な宛先アドレスを有し、
前記管理装置は、記憶装置を有し、
前記管理装置を実現するコンピュータに、
複数の前記ノードに対し、論理識別子空間上で論理識別子を付与する手順、
前記論理識別子空間と、前記データ群におけるデータの分布と、を対応付け、各前記ノードの前記論理識別子に対応する前記データの値の範囲を決定する手順、
ある属性値または属性範囲のデータの格納先の前記ノードの宛先を探索するとき、各前記ノードの前記データの値の範囲と、前記論理識別子と、前記宛先アドレスとの対応関係に基づき、前記属性値または前記属性範囲の少なくとも一部が一致する前記データの範囲に対応する前記論理識別子を求め、当該論理識別子に対応する前記ノードの宛先アドレスを前記宛先として決定する手順を実行させるためのものである。
上記情報システムの管理方法の管理装置に接続され、前記管理装置を介して前記データにアクセスする端末装置のデータ処理方法であって、
前記端末装置が、
属性値または属性範囲を有するデータへのアクセス要求を前記管理装置に通知し、
前記管理装置を介して、複数の前記ノードの宛先アドレスと、各ノードに割り当てられた論理識別子と、各ノードが管理している前記データの値の範囲との対応関係に基づいて、前記アクセス要求された前記属性値または前記属性範囲の少なくとも一部が一致する範囲の前記データを管理する前記ノードの宛先にアクセスして前記データを操作する。
データ群を分散して管理する複数のノードを管理するサーバに接続されたクライアント端末を実現するコンピュータのプログラムであって、
複数の前記ノードは、それぞれネットワーク上で識別可能な宛先アドレスを有し、
前記クライアント端末を実現するコンピュータに、
属性値または属性範囲を有するデータへのアクセス要求を受け付ける手順、
受け付けた前記アクセス要求を前記サーバに通知する手順、
複数の前記ノードの宛先アドレスと、各ノードに割り当てられた論理識別子と、各ノードが管理している前記データの値の範囲との対応関係に基づいて、前記アクセス要求された前記属性値または前記属性範囲の少なくとも一部が一致する前記データの範囲に対応する前記論理識別子を求め、前記宛先として決定された前記論理識別子に対応する前記ノードの宛先アドレスを前記サーバから受信する手順、
前記サーバから受信した前記宛先アドレスの前記ノードにアクセスし、前記属性値または前記属性範囲の前記データを操作する手順を実行させるためのものである。
データ群を分散して管理する複数のノードの宛先を決定する際に参照する宛先テーブルのデータ構造であって、
複数の前記ノードは、それぞれネットワーク上で識別可能な宛先アドレスを有し、
前記宛先テーブルは、前記データ群を分散して管理する複数のノードの宛先アドレスと、各ノードに論理識別子空間上で付与された論理識別子と、各前記ノードが管理するデータの値の範囲との対応関係を含み、
各前記ノードのデータの値の範囲は、前記論理識別子空間と、前記データ群におけるデータの分布と、を対応付け、各前記ノードの前記論理識別子に対応する前記データの値の範囲が各ノードに割り振られる。
以下に、発明を実施するための最良の形態について図面を参照して詳細に説明する。
図1は、本発明の実施の形態に係る情報システム1の構成を示す機能ブロック図である。
本発明の実施の形態の情報システム1は、互いにネットワーク3を介して接続される複数のコンピュータ、たとえば、複数のスキーマ管理サーバ102(図1では、スキーマ管理サーバA1~Anと示す。以下、nは自然数であり、それぞれ異なる値をとってもよい。)と、複数のデータ操作クライアント104(図1では、データ操作クライアントB1~Bnと示す。)と、複数のデータ格納サーバ106(図1では、データ格納サーバC1~Cnと示す。)と、複数の操作要求中継サーバ108(図1では、操作要求中継サーバD1~Dnと示す。)と、を備える。
また、分散したコンピュータに送信されたメッセージやイベントに対して、多次元属性の範囲に関する条件を指定することで、データの発生の検知や通知を設定するPublish/Subscribeといったメッセージ送受信形態の用途にも適用可能である。
この構成において、アクセスコンピュータ202は、図1のデータ操作クライアント104を備え、データコンピュータ208は、図1のデータ格納サーバ106を備える。
図4に示すように、本実施形態の情報システム1は、スキーマ管理サーバ102と、事前処理部120と、宛先解決部340と、操作要求部360と、中継部380と、データ格納サーバ106と、を備える。なお、図4では、スキーマ管理サーバ102および事前処理部120は、ネットワーク3に接続されていないが、ネットワーク3に接続された構成としてもよい。
複数のノード(データ格納サーバ106)に格納されるデータ群のデータは、予め定められた条件範囲の属性値を有するデータの集合、または予め定められた類似の分布を有するデータの集合を含む。このデータの分布に基づいて、各データ格納サーバ106が担当するデータの属性値の範囲を決めることになる。
本実施形態の情報システム1は、データ群を分散して管理する複数のノード(データ格納サーバ106)を備える。
複数のノード(データ格納サーバ106(図1))は、それぞれネットワーク上で識別可能な宛先アドレスを有する。
ID付与部112は、複数のノード(データ格納サーバ106)に対し、論理識別子空間上で論理識別子を付与する。
範囲決定部114は、論理識別子空間と、データ群におけるデータの分布と、を対応付け、各ノード(データ格納サーバ106)の論理識別子に対応するデータの値の範囲を決定する。なお、範囲決定部114は、スキーマ管理サーバ102が生成した分布情報116を使用する。分布情報116の生成については、後述する実施形態で詳細に説明する。
たとえば、Consistent Hashingや分散ハッシュテーブルの場合は、論理識別子は、ハッシュ値と宛先コンピュータのIPアドレスなどである。
本実施形態において、データ群におけるある属性値に基づく分布情報116が図7(a)に示すような累積分布で示される場合、範囲決定部114は、横軸に属性値空間、縦軸に論理識別子(ID)空間を対応させることで、各ノードにそれぞれ付与された論理識別子に対応する属性値空間の範囲を決定することができる。たとえば、論理識別子413のノードは、属性値a4~a5の範囲のデータを格納することとなる。あるいは、属性値の一方の端点(a5)だけを管理してもよい。この場合、他方の端点は隣接ノード(論理識別子250のノード)の端点(a4)とする。このようにしてIDと属性値の範囲の対応関係が決定され、図7(b)に示すように、対応関係記憶部118に記憶される。
図8および図9は、本実施形態の情報システム1の動作を示すフローチャートである。
以下、図5、図8、および図9を用いて説明する。
本発明の実施の形態に係る情報システム1の管理方法は、事前処理部120(図5)において、ID付与部112(図5)が、複数のノードに対し、論理識別子空間上で論理識別子を付与し(図8のステップS11)、範囲決定部114(図5)が、論理識別子空間と、データ群におけるデータの分布と、を対応付け、各ノードの論理識別子に対応するデータの値の範囲を決定し(図8のステップS13)、ある属性値または属性範囲のデータの格納先のノードの宛先を探索するとき(図9のステップS21のYES)、宛先解決部340(図5)が、各ノードのデータの値の範囲と、論理識別子と、宛先アドレスとの対応関係に基づき、属性値または属性範囲の少なくとも一部が一致するデータの範囲に対応する論理識別子を求め、当該論理識別子に対応するノードの宛先アドレスを宛先として決定する(図9のステップS23)。
事前処理部120において、ID付与部112が、複数のノードに対し、論理識別子空間上で論理識別子を付与する(図8のステップS11)。そして、範囲決定部114が、論理識別子空間と、データ群におけるデータの分布と、を対応付け、各ノードの論理識別子に対応するデータの値の範囲を決定する(図8のステップS13)。
本実施形態の情報システム1は、上記実施形態とは、多次元属性データに対し、空間充填曲線変化処理を施して属性値に基づくデータの分布情報を得ることで、多次元属性データについても同様に宛先を決定できる点で相違する。本実施形態において、上記実施形態で説明した情報システム1の事前処理部120(図4、図5)が事前処理部320に変更になる。
以下、本実施形態の情報システム1について、説明する。
本実施形態の情報システム1において、データ群は、多次元の属性を有するデータを含むことができる。さらに、情報システム1は、データ群から予め定められた属性値に基づくデータに含まれる多次元属性値を、空間充填曲線変換処理を行い1次元化する空間充填曲線1次元化部304と、空間充填曲線1次元化部304により1次元化された値の累積分布を算出する分布算出部308と、を備える。
そして、後述する事前処理部320は、分布算出部308が算出した累積分布を分布情報として処理を行う。
本実施形態の情報システム1は、データ群のデータの分布を表す分布関数を求め、各ノードの論理識別子を入力として、当該分布関数の逆関数を施し、1次元値を出力する逆関数部324と、1次元値を、空間充填曲線変換処理により多次元値に変換する空間充填曲線多次元化部(空間充填曲線サーバ変換部326)と、をさらに備える。
そして、ノードの論理識別子の集合に対し、逆関数部324により逆関数を施して生成された1次元値の集合を、空間充填曲線サーバ変換部326により多次元値に変換し、得られた多次元値と、論理識別子と、宛先アドレスとを対応付けて対応関係として保持する。
サンプルデータ1次元値格納部306には、サンプルの多次元属性データを1次元値に変換した値が格納される。
分布格納部310には、当該分散システムに格納される多次元属性データの一部、あるいは、その分布情報が互いに類似するデータの集合と同一の分布情報を有する、1次元の累積分布情報が格納される。
ある変換規則表の状態にて、入力として特定ビット目の多次元値が与えられると、その変換規則表の状態の変換規則表の内、当該多次元値を上段に持つ変換規則が得られ、対応する下段の1次元値が得られるとともに、その多次元値に対応する次の変換規則表状態に遷移する。
本実施形態の情報システム1において、論理識別子の集合(範囲)と、対応する宛先アドレスと、を対応付けた宛先サーバテーブルを記憶する宛先サーバ記憶部(宛先サーバ情報格納部322)と、分布情報を用いた分布関数の逆関数を施す逆関数部324と、1次元値を、空間充填曲線変換処理により多次元値に変換する空間充填曲線多次元化部(空間充填曲線サーバ変換部326)と、をさらに備え、宛先サーバテーブルを参照し、各コンピュータに(分布が統計的に均一になるように)割り当てられた論理識別子(ハッシュ値)の集合に対し、逆関数部324により逆関数を施して生成される1次元値の集合を、空間充填曲線多次元化部(空間充填曲線サーバ変換部326)により多次元値に変換し、宛先アドレスと対応付けて予め対応情報テーブル(空間充填曲線サーバ情報格納部328の空間充填曲線サーバ情報テーブル332(図13))に記憶する。
本実施形態の情報システム1は、さらに、複数のコンピュータに分散して格納されるデータ群に対し、データの処理の操作要求とともに、操作要求を受け付けたデータに対応する属性値を受け付ける操作要求部360と、決定部(空間充填曲線サーバ決定部346)が決定した宛先アドレスに、受け付けた操作要求を転送する転送部(中継部380または操作要求部360)と、を備え、決定部(空間充填曲線サーバ決定部346)は、操作要求部360が受け付けた属性値に基づいて、宛先アドレスを決定し、中継部380(または操作要求部360)に受け渡す。
また、操作要求部360は、データ追加削除部362と、データ検索部364と、を有する。
さらに、データ格納サーバ106は、データ格納部390を備えている。
範囲宛先解決部344は、与えられた多次元属性の範囲を入力として、そのデータに関する操作要求を送信すべき宛先のコンピュータの宛先アドレスを複数取得する。
本実施形態の情報システムの管理方法は、上記実施形態の管理方法に加え、さらに、スキーマ管理サーバ102(図10)において、空間充填曲線1次元化部304(図10)が、データ群から予め定められた属性値に基づくデータに含まれる多次元属性値を、空間充填曲線変換処理を行い1次元化し、分布算出部308(図10)が、1次元化された値の累積分布を算出し、事前処理部320(図12)が、分布算出部308(図10)が算出した累積分布をデータの分布として、論理識別子空間との対応付けを行う。
まず、本実施形態の情報システム1における1次元化された多次元分布を生成するスキーマ管理サーバ102の動作について説明する。
本実施の形態のスキーマ管理サーバ102の動作について詳細に説明する。この動作は、本実施形態の情報システム1の起動時、定期的、または手動要求時などのタイミングにより実行される。図15は、本実施形態の情報システム1のスキーマ管理サーバ102における一次元化された多次元分布の生成を行う処理(ステップS101)の一例を示すフローチャートである。以下、図10と図15を用いて説明する。
図17および図18は、本実施形態の情報システム1における操作要求に呼応した宛先解決部340の宛先決定処理(ステップS301)および複数の宛先決定処理(ステップS401)の動作の例それぞれ示すフローチャートである。
なお、転送先のコンピュータにおいて、さらに、操作要求の転送が必要な場合、宛先解決部340(図14)の単一宛先解決部342(図14)が、操作要求に含まれる多次元属性の値をもとに、宛先を決定する。
なお、転送先のコンピュータにおいて、さらに、操作要求の転送が必要な場合、宛先解決部340(図14)の範囲宛先解決部344(図14)が、操作要求に含まれる多次元属性の範囲をもとに、転送先の宛先(IPアドレス)を決定する。
そして、本実施形態の情報システム1によれば、データの登録、削除、検索等の操作の実行以前に、属性値または属性部分空間に対するデータを担当しているコンピュータの宛先情報を下記の手順で準備しておくことができる。
すなわち、宛先サーバ情報格納部322(図12)に格納される宛先サーバ情報テーブル330(図6)の情報とデータ分布の情報から逆関数部324(図12)を用いて、宛先サーバ毎の1次元値を算出し、与えられた1次元値を入力として、空間充填曲線サーバ変換部326(図12)によって多次元値を出力し、この多次元値と宛先サーバとの対から、空間充填曲線サーバ情報格納部328(図12)に、属性値または属性部分空間に対する宛先情報を格納することができる。
その理由は、データの登録や削除、検索を行う際には、多次元の属性値や属性条件を1次元の値や範囲に変換する処理を行う必要がないからである。
その理由は、データの登録や削除、検索を行う際には、多次元の属性値や属性条件を1次元の値や範囲に変換する処理を行う必要がないからである。
本実施例では、図2に示すように、アクセスコンピュータ202から、複数のデータコンピュータ208に格納されたデータを操作する例を示す。図2のアクセスコンピュータ202には図1のデータ操作クライアント104が存在し、図2のメタデータコンピュータ204には図1のスキーマ管理サーバ102が存在し、図2のデータコンピュータ208には、図1のデータ格納サーバ106が存在するとする。
スキーマ管理サーバ102(図10)における、図16の空間充填曲線サーバ情報の生成処理においては、まず、図10の空間充填曲線1次元化部304は、図19のデータ分布1001に表された各データの多次元属性値から、1次元化を行い、それぞれを図10のサンプルデータ1次元値格納部306に格納する。次に、図10の分布算出部308は、格納された1次元値からその累積分布情報を累積ヒストグラムなどの形式で算出し、図10の分布格納部310に格納する。
空間充填曲線サーバ決定部346(図14)は、各次元の先頭ビットを取り出し第1多次元ビット(01)を得る。初期の変換規則表状態が0であるとする。
状態0の変換規則から、出力として第1の一次元ビット(01)を出力する。ここで空間充填曲線サーバ情報を参照し、その値域端点のビットパターンが一次元ビット01から始まる値域端点011011(27)にポインタを移動する。
変換規則にて、入力の多次元ビット列が01の時の変換規則表状態は0であるので、別の表には遷移せずに同じ表を用いる。
次のビットとして第3多次元ビット(11)を取り出し、状態2の変換規則表にて、第3の1次元(00)が出力され、これも先のビット列に追加され1次元ビット(011100)、10進数としては28を得る。
これを値域として管理するノードは、論理識別子が551であり、図25に示す空間充填曲線サーバ情報テーブル332から、IPが10.1.1.5であるノードが選択される。このようにして、宛先を決定することができる。
Claims (13)
- データ群を分散して管理する複数のノードを備え、
複数の前記ノードは、それぞれネットワーク上で識別可能な宛先アドレスを有し、
複数の前記ノードに対し、論理識別子空間上で論理識別子を付与する識別子付与手段と、
前記論理識別子空間と、前記データ群におけるデータの分布と、を対応付け、各前記ノードの前記論理識別子に対応する前記データの値の範囲を決定する範囲決定手段と、
ある属性値または属性範囲のデータの格納先の前記ノードの宛先を探索するとき、各前記ノードの前記データの値の前記範囲と、前記論理識別子と、前記宛先アドレスとの対応関係に基づき、前記属性値または前記属性範囲の少なくとも一部が一致する前記データの範囲に対応する前記論理識別子を求め、当該論理識別子に対応する前記ノードの宛先アドレスを前記宛先として決定する宛先決定手段と、を備える情報システム。 - 請求項1に記載の情報システムにおいて、
前記データ群は、多次元の属性を有するデータを含み、
前記データ群から予め定められた属性値に基づくデータに含まれる多次元属性値を、空間充填曲線変換処理を行い1次元化する空間充填曲線1次元化手段と、
前記空間充填曲線1次元化手段により1次元化された値の累積分布を算出する分布算出手段と、をさらに備え、
前記範囲決定手段は、前記分布算出手段が算出した前記累積分布を前記データの分布として、前記論理識別子空間との対応付けを行う情報システム。 - 請求項2に記載の情報システムにおいて、
前記データの分布を表す分布関数を求め、各前記ノードの前記論理識別子を入力として、当該分布関数の逆関数を施し、1次元値を出力する逆関数手段と、
前記1次元値を、空間充填曲線変換処理により多次元値に変換する空間充填曲線多次元化手段と、をさらに備え、
前記ノードの前記論理識別子の集合に対し、前記多次元値と、前記論理識別子と、前記宛先アドレスとを対応付けて前記対応関係として保持する情報システム。 - 請求項1乃至3いずれかに記載の情報システムにおいて、
複数の前記ノードが分散して管理する前記データ群の前記データは、予め定められた条件範囲の属性値を有するデータの集合、または予め定められた類似の分布を有するデータの集合を含む情報システム。 - 請求項1乃至4いずれかに記載の情報システムにおいて、
複数の前記ノードに分散して格納される前記データ群に対し、データの処理の操作要求とともに、前記操作要求を受け付けた前記データに対応する属性値を受け付ける操作要求受付手段と、
前記宛先決定手段が決定した前記宛先アドレスに、受け付けた前記操作要求を転送する転送手段と、をさらに備え、
前記宛先決定手段は、前記操作要求受付手段が受け付けた前記属性値に基づいて、前記宛先アドレスを決定し、前記転送手段に受け渡す情報システム。 - 請求項5に記載の情報システムにおいて、
前記操作要求受付手段が受け付ける前記操作要求は、前記データの登録、削除、または検索を行う情報システム。 - 請求項1乃至6いずれかに記載の情報システムにおいて、
前記ノード毎に前記対応関係を記憶する記憶手段をさらに備える情報システム。 - 請求項1乃至7いずれかに記載の情報システムにおいて、
前記ネットワーク上の前記ノードが追加または削除されたとき、
前記ノードの前記論理識別子の集合を変更し、その変更に伴い、前記対応関係を更新する更新手段をさらに備える情報システム。 - データ群を分散して管理する複数のノードを管理する情報システムの管理方法であって、
複数の前記ノードは、それぞれネットワーク上で識別可能な宛先アドレスを有し、
前記情報システムは、管理装置と、記憶装置と、を有し、
前記管理装置が、
複数の前記ノードに対し、論理識別子空間上で論理識別子を付与し、
前記論理識別子空間と、前記データ群におけるデータの分布と、を対応付け、各前記ノードの前記論理識別子に対応する前記データの値の範囲を決定し、
ある属性値または属性範囲のデータの格納先の前記ノードの宛先を探索するとき、各前記ノードの前記データの値の前記範囲と、前記論理識別子と、前記宛先アドレスとの対応関係に基づき、前記属性値または前記属性範囲の少なくとも一部が一致する前記データの範囲に対応する前記論理識別子を求め、当該論理識別子に対応する前記ノードの宛先アドレスを前記宛先として決定する情報システムの管理方法。 - データ群を分散して管理する複数のノードを管理する管理装置を実現するコンピュータのプログラムであって、
複数の前記ノードは、それぞれネットワーク上で識別可能な宛先アドレスを有し、
前記管理装置は、記憶装置を有し、
前記管理装置を実現するコンピュータに、
複数の前記ノードに対し、論理識別子空間上で論理識別子を付与する手順、
前記論理識別子空間と、前記データ群におけるデータの分布と、を対応付け、各前記ノードの前記論理識別子に対応する前記データの値の範囲を決定する手順、
ある属性値または属性範囲のデータの格納先の前記ノードの宛先を探索するとき、各前記ノードの前記データの値の範囲と、前記論理識別子と、前記宛先アドレスとの対応関係に基づき、前記属性値または前記属性範囲の少なくとも一部が一致する前記データの範囲に対応する前記論理識別子を求め、当該論理識別子に対応する前記ノードの宛先アドレスを前記宛先として決定する手順を実行させるためのプログラム。 - 請求項9に記載の情報システムの管理方法の管理装置に接続され、前記管理装置を介して前記データにアクセスする端末装置のデータ処理方法であって、
前記端末装置が、
属性値または属性範囲を有するデータへのアクセス要求を前記管理装置に通知し、
前記管理装置を介して、複数の前記ノードの宛先アドレスと、各ノードに割り当てられた論理識別子と、各ノードが管理している前記データの値の範囲との対応関係に基づいて、前記アクセス要求された前記属性値または前記属性範囲の少なくとも一部が一致する範囲の前記データを管理する前記ノードの宛先にアクセスして前記データを操作する端末装置のデータ処理方法。 - データ群を分散して管理する複数のノードを管理するサーバに接続されたクライアント端末を実現するコンピュータのプログラムであって、
複数の前記ノードは、それぞれネットワーク上で識別可能な宛先アドレスを有し、
前記クライアント端末を実現するコンピュータに、
属性値または属性範囲を有するデータへのアクセス要求を受け付ける手順、
受け付けた前記アクセス要求を前記サーバに通知する手順、
複数の前記ノードの宛先アドレスと、各ノードに割り当てられた論理識別子と、各ノードが管理している前記データの値の範囲との対応関係に基づいて、前記アクセス要求された前記属性値または前記属性範囲の少なくとも一部が一致する前記データの範囲に対応する前記論理識別子を求め、前記宛先として決定された前記論理識別子に対応する前記ノードの宛先アドレスを前記サーバから受信する手順、
前記サーバから受信した前記宛先アドレスの前記ノードにアクセスし、前記属性値または前記属性範囲の前記データを操作する手順を実行させるためのプログラム。 - データ群を分散して管理する複数のノードの宛先を決定する際に参照する宛先テーブルのデータ構造であって、
複数の前記ノードは、それぞれネットワーク上で識別可能な宛先アドレスを有し、
前記宛先テーブルは、前記データ群を分散して管理する複数のノードの宛先アドレスと、各ノードに論理識別子空間上で付与された論理識別子と、各前記ノードが管理するデータの値の範囲との対応関係を含み、
各前記ノードのデータの値の範囲は、前記論理識別子空間と、前記データ群におけるデータの分布と、を対応付け、各前記ノードの前記論理識別子に対応する前記データの値の範囲が各ノードに割り振られるデータ構造。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/348,041 US20140244794A1 (en) | 2011-09-27 | 2012-09-26 | Information System, Method and Program for Managing the Same, Method and Program for Processing Data, and Data Structure |
JP2013535916A JP6135509B2 (ja) | 2011-09-27 | 2012-09-26 | 情報システム、その管理方法およびプログラム、データ処理方法およびプログラム、ならびに、データ構造 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2011211157 | 2011-09-27 | ||
JP2011-211157 | 2011-09-27 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013046667A1 true WO2013046667A1 (ja) | 2013-04-04 |
Family
ID=47994747
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2012/006152 WO2013046667A1 (ja) | 2011-09-27 | 2012-09-26 | 情報システム、その管理方法およびプログラム、データ処理方法およびプログラム、ならびに、データ構造 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20140244794A1 (ja) |
JP (1) | JP6135509B2 (ja) |
WO (1) | WO2013046667A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018225314A1 (ja) * | 2017-06-05 | 2018-12-13 | 株式会社東芝 | データベース管理システムおよびデータベース管理方法 |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9681003B1 (en) * | 2013-03-14 | 2017-06-13 | Aeris Communications, Inc. | Method and system for managing device status and activity history using big data storage |
JP2015075830A (ja) * | 2013-10-07 | 2015-04-20 | 富士通株式会社 | 並列処理管理プログラム、並列処理管理方法、及び、並列処理管理装置 |
CN106527990B (zh) * | 2016-11-09 | 2019-08-30 | 浪潮天元通信信息系统有限公司 | 一种网管信息处理服务器、方法和系统 |
US10812526B2 (en) * | 2017-04-24 | 2020-10-20 | Caligo Systems Ltd. | Moving target defense for securing internet of things (IoT) |
EP3678086A4 (en) * | 2017-12-04 | 2020-07-08 | Sony Corporation | INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND PROGRAM |
CN110225144B (zh) * | 2018-03-02 | 2021-03-23 | 华为技术有限公司 | 获取及提供服务的方法、用户设备和管理服务器 |
US11921767B1 (en) * | 2018-09-14 | 2024-03-05 | Palantir Technologies Inc. | Efficient access marking approach for efficient retrieval of document access data |
JP7414617B2 (ja) * | 2020-03-31 | 2024-01-16 | キヤノン株式会社 | システム、サーバー装置、および方法 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008234563A (ja) * | 2007-03-23 | 2008-10-02 | Nec Corp | オーバレイ管理装置、オーバレイ管理システム、オーバレイ管理方法およびオーバレイ管理用プログラム |
JP2009522660A (ja) * | 2005-12-29 | 2009-06-11 | アマゾン・テクノロジーズ・インコーポレーテッド | 検索可能なデータサービスのための方法及び装置 |
JP2010509692A (ja) * | 2006-11-14 | 2010-03-25 | シーメンス アクチエンゲゼルシヤフト | ピアツーピア・オーバーレイ・ネットワークにおける負荷分散のための方法 |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7773088B2 (en) * | 2000-06-19 | 2010-08-10 | Mental Images Gmbh | Simultaneous simulation of markov chains using quasi-monte carlo techniques |
US7167856B2 (en) * | 2001-05-15 | 2007-01-23 | Jonathan Keir Lawder | Method of storing and retrieving multi-dimensional data using the hilbert curve |
US7788400B2 (en) * | 2003-09-19 | 2010-08-31 | Hewlett-Packard Development Company, L.P. | Utilizing proximity information in an overlay network |
US7483391B2 (en) * | 2003-09-19 | 2009-01-27 | Hewlett-Packard Development Company, L.P. | Providing a notification including location information for nodes in an overlay network |
US20050108203A1 (en) * | 2003-11-13 | 2005-05-19 | Chunqiang Tang | Sample-directed searching in a peer-to-peer system |
US7313565B2 (en) * | 2004-02-19 | 2007-12-25 | Microsoft Corporation | Data overlay, self-organized metadata overlay, and associated methods |
US7418454B2 (en) * | 2004-04-16 | 2008-08-26 | Microsoft Corporation | Data overlay, self-organized metadata overlay, and application level multicasting |
JP2006024168A (ja) * | 2004-07-06 | 2006-01-26 | Fujitsu Ltd | サーバシステム,ユーザ端末並びに同サーバシステムおよび同ユーザ端末を用いたサービス提供方法 |
US7529196B2 (en) * | 2004-12-07 | 2009-05-05 | Hewlett-Packard Development Company, L.P. | Routing a service query in an overlay network |
US8208477B1 (en) * | 2005-08-24 | 2012-06-26 | Hewlett-Packard Development Company, L.P. | Data-dependent overlay network |
US20070079004A1 (en) * | 2005-09-30 | 2007-04-05 | Junichi Tatemura | Method and apparatus for distributed indexing |
US20070150498A1 (en) * | 2005-12-23 | 2007-06-28 | Xerox Corporation | Social network for distributed content management |
US8693392B2 (en) * | 2007-02-21 | 2014-04-08 | Avaya Canada Corp. | Peer-to-peer communication system and method |
US8028019B2 (en) * | 2007-02-28 | 2011-09-27 | Solid State Networks, Inc. | Methods and apparatus for data transfer in networks using distributed file location indices |
US20090132716A1 (en) * | 2007-11-15 | 2009-05-21 | Junqueira Flavio P | Fault-tolerant distributed services methods and systems |
US8385267B2 (en) * | 2010-02-19 | 2013-02-26 | Research In Motion Limited | Client routing in a peer-to-peer overlay network |
US8892569B2 (en) * | 2010-12-23 | 2014-11-18 | Ianywhere Solutions, Inc. | Indexing spatial data with a quadtree index having cost-based query decomposition |
-
2012
- 2012-09-26 WO PCT/JP2012/006152 patent/WO2013046667A1/ja active Application Filing
- 2012-09-26 US US14/348,041 patent/US20140244794A1/en not_active Abandoned
- 2012-09-26 JP JP2013535916A patent/JP6135509B2/ja active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009522660A (ja) * | 2005-12-29 | 2009-06-11 | アマゾン・テクノロジーズ・インコーポレーテッド | 検索可能なデータサービスのための方法及び装置 |
JP2010509692A (ja) * | 2006-11-14 | 2010-03-25 | シーメンス アクチエンゲゼルシヤフト | ピアツーピア・オーバーレイ・ネットワークにおける負荷分散のための方法 |
JP2008234563A (ja) * | 2007-03-23 | 2008-10-02 | Nec Corp | オーバレイ管理装置、オーバレイ管理システム、オーバレイ管理方法およびオーバレイ管理用プログラム |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018225314A1 (ja) * | 2017-06-05 | 2018-12-13 | 株式会社東芝 | データベース管理システムおよびデータベース管理方法 |
Also Published As
Publication number | Publication date |
---|---|
JP6135509B2 (ja) | 2017-05-31 |
US20140244794A1 (en) | 2014-08-28 |
JPWO2013046667A1 (ja) | 2015-03-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6135509B2 (ja) | 情報システム、その管理方法およびプログラム、データ処理方法およびプログラム、ならびに、データ構造 | |
JP6094487B2 (ja) | 情報システム、管理装置、データ処理方法、データ構造、プログラム、および記録媒体 | |
JP5759915B2 (ja) | ファイルリスト生成方法及びシステム並びにプログラム、ファイルリスト生成装置 | |
US8713182B2 (en) | Selection of a suitable node to host a virtual machine in an environment containing a large number of nodes | |
US20150215405A1 (en) | Methods of managing and storing distributed files based on information-centric network | |
CN103455531B (zh) | 一种支持高维数据实时有偏查询的并行索引方法 | |
CN106030573A (zh) | 半结构化数据作为第一等级数据库元素的实现 | |
US9875270B1 (en) | Locking item ranges for creating a secondary index from an online table | |
JP2009295127A (ja) | アクセス方法、アクセス装置及び分散データ管理システム | |
US20130198198A1 (en) | Generating method, generating system, and recording medium | |
CN111767287A (zh) | 数据导入方法、装置、设备及计算机存储介质 | |
US11055262B1 (en) | Extensible streams on data sources | |
CN114077680A (zh) | 一种图数据的存储方法、系统及装置 | |
Malensek et al. | Expressive query support for multidimensional data in distributed hash tables | |
Kumar et al. | M-Grid: a distributed framework for multidimensional indexing and querying of location based data | |
Goswami et al. | Graphmap: Scalable iterative graph processing using nosql | |
Ahad et al. | Comparing and analyzing the characteristics of hadoop, cassandra and quantcast file systems for handling big data | |
CN113095778B (zh) | 通过多个邮箱在通信应用中进行海量数据管理的架构 | |
Cheng et al. | A Multi-dimensional Index Structure Based on Improved VA-file and CAN in the Cloud | |
CN114153987A (zh) | 分布式知识图谱查询方法、装置及存储介质 | |
WO2015049734A1 (ja) | 検索システム及び検索方法 | |
Papapetrou et al. | PCIR: Combining DHTs and peer clusters for efficient full-text P2P indexing | |
CN117171161A (zh) | 数据查询方法及装置 | |
US12253974B2 (en) | Metadata processing method and apparatus, and a computer-readable storage medium | |
Fujita | Similarity search in interplanetary file system with the aid of locality sensitive hash |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12836583 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2013535916 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14348041 Country of ref document: US |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 12836583 Country of ref document: EP Kind code of ref document: A1 |