WO2007038981A1 - Serveur de bases de données parallèles et procédé de fonctionnement d’un serveur de bases de données - Google Patents
Serveur de bases de données parallèles et procédé de fonctionnement d’un serveur de bases de données Download PDFInfo
- Publication number
- WO2007038981A1 WO2007038981A1 PCT/EP2005/054761 EP2005054761W WO2007038981A1 WO 2007038981 A1 WO2007038981 A1 WO 2007038981A1 EP 2005054761 W EP2005054761 W EP 2005054761W WO 2007038981 A1 WO2007038981 A1 WO 2007038981A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- database server
- processing
- processing nodes
- dataset
- nodes
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 15
- 230000015654 memory Effects 0.000 claims abstract description 34
- 238000005192 partition Methods 0.000 claims description 12
- 208000010201 Exanthema Diseases 0.000 claims description 11
- 201000005884 exanthem Diseases 0.000 claims description 11
- 206010037844 rash Diseases 0.000 claims description 11
- 238000000638 solvent extraction Methods 0.000 claims description 4
- 239000004065 semiconductor Substances 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 6
- 238000007726 management method Methods 0.000 description 6
- 230000008520 organization Effects 0.000 description 4
- 238000013500 data storage Methods 0.000 description 3
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24532—Query optimisation of parallel queries
Definitions
- the present invention concerns a database server, in particular a high performance, massively parallel database server, and a method for operating such a database server.
- US-B1 -6'463'433 describes a database system comprising one or more front end computers and one or more computer nodes interconnected by a network.
- a query from a user istransmitted from one of the front end computersto one of the computer nodes, called the home node, which extracts features from the query and sends it to other computer nodes
- Each participating computer node then performs a search on its respective partition of the database.
- the structure of the database is thus hierarchical; the home nodes decide which computer node should participate in a query. Performance thusdependson a good repartition of the overall processing power between the home nodes and the other nodes.
- each node is made up of a complete computer with its 2 KNOFES 1 -PCT
- U9-B1 -5'857'180 describes a method and apparatusfor parallel processing in a database environment.
- the method providesthe ability to dynamically partition row sources and for directing row sourcesto one or more query slaves
- the organization isthus hierarchical; a query coordinator controlsthe processing of a query.
- the organization described in thisdocument is a purely software-based, and does not rely on any physical partitioning of the query slaves or memory partitions. It does not necessarily make an efficient use of the underlying hardware organization. If the method is used in a software-implemented shared disk system, performance strongly depends on the bandwidth of the communication busto the disk.
- USFE-38'410 describes an apparatusfor parallel data storage and processing.
- the apparatus is optimized for fast processing of pixmap images; applicationsto databases are not suggested. It relies on an array of processing nodes, each node comprising one processor and at least one disk. Performance is thus limited by the access time of the disks and by the speed of the general purpose processors in the nodes.
- U9-A1 -2003/0018630 describes a data storage and retrieval device using a systolic array of FPGA for performing searches over data in a shared disk. Although use of dedicated FPGA strongly improvesthe processing power of each node, the throughput still relies on the accesstime and bandwidth of the shared disk.
- these aims are achieved by means of a non-hierarchical structure of interconnected nodes.
- a non-hierarchical structure hasthe advantage, among others, of being less dependant on the processing power of the higher level nodes, and of being more f lexible and better adapted to various tasks and database organizations.
- each node can act as a master for some tasks, a higher redundancy is achieved, thus making the whole system less sensitive to malfunctioning of one single node.
- the non-hierarchical structure also allowsfor a better scalability; processing power and memory space can be increased very easily by adding processing nodes. Different nodes in input/output capacity will be accommodated by varying the ratio of input/output nodes (I/O nodes) in the structure. Thus, fewer I/O nodes will be required for applicationssuch asdata mining than for media streaming, video on demand or OLTP applications, for example.
- the nodes are diskless.
- each node has its own local Rash memory for non-volatile or permanent storage of a partition of the entire dataset.
- the use of f lash memory, instead of hard disks, f urther reducesthe volume, footprint and power consumption of the server.
- Even the non-volatile storage of the database content isthus decentralized.
- a local RAM is also associated with each processing node, for short-term storage of the partition of the memory on which the node performs computations.
- the data resides totally in the decentralized RAM during operation, and in the decentralized Rash memory for non-volatile data storage.
- Each processing node thus has a very fast, not shared accessto its local permanent and temporary memory space.
- the f lash memory is always in write mode, and used as a destination for the write through processduring data updates and to ensure compliance with data retention tests in case of system failure. 4 KNOFES 1 -PCT
- the access latency can be reduced from several milliseconds in conventional systemsto a few hundreds nanoseconds, or even to a few tensof nanosecondsif the read access are entirely RAM based, the f lash memories being only used asa backup.
- the processing elements in each node are made up of FPGA components programmed and optimized for executing SQL commands on the local dataset.
- FPGA components programmed and optimized for executing SQL commands on the local dataset.
- dedicated components for processing 9QL commands instead of generic processors, a further increase in computational power can be achieved.
- a dedicated ASIC is used instead of FPGA components.
- each computing node has a very short memory horizon, for example 128 Mbytes only, and can perform extremely fast operationson the partition of the dataset in thissmall memory space. For example, search queries, such as " select from Where" SQL queries, can be performed extremely fast, without any need to index the tables, thus reducing the load time.
- Processing nodes are preferably mutually connected through a bi-dimensional, three-dimensional or n-dimensional mesh of high speed links, preferably serial links.
- the processing nodes are all of equal importance asthere is no hierarchy in the network structure; each node is part of the query and result-routing infrastructure.
- the processing nodes preferably each comprise at least one routing engine to forward data and commandsto the relevant nodes within the structure.
- this routing engine is also made up of FPGA components, or even of the same FPGA chip asthe processing element in the node.
- the processing nodesfurther comprise 5 KNOFES 1 -PCT
- processing means for joining processing results computed by different processing nodes are joined by the routing engines having dedicated computing capabilities. For example, during the routing of the result f ragments of a sorted list, the routing engine will join the pre-sorted list f ragments retrieved f rom the individual nodes into a f ully sorted list, and deliversthe consolidated, sorted result to the input/output node.
- Each processing node may further comprise a storage management circuit for managing storage and retrieval of data f rom and into said local RAM and local Rash memory.
- the storage management circuit is also made up of FPGA components, preferably sharing a FPGA chip with the other FPGA components in the node.
- Rg. 1 is a block diagram illustrating an example of database server comprising 5 electronic cards.
- Rg. 2 is a block diagram illustrating an example of electronic card comprising one input/output node and 5 clusters
- Rg. 3 is a block diagram illustrating an example of cluster comprising 10 processing nodes
- Rg. 4 is a block diagram illustrating an example of processing node. 6 KNOFES 1 -PCT
- Rg. 1 is a block diagram illustrating an example of database server 1 comprising five electronic cards 3 in a cage.
- the electronic cards 3 may comprise connectors on one, two, three or all four sidesfor interconnectionsof the cards
- the cards are mounted on a backplane having a connecting busfor mounting and interconnecting any number of cardsfrom 1 to n.
- the cards are mounted in a bi-dimensional array, each card being connected over serial linkswith all itsdirect neighbors.
- the five cards are connected in a single row, and thuseach card is connected to a maximum of two other cards. It would be possible however to interconnect the cards in several rows and columns, thus having cards connected to three or four other cards All cards preferably feature interconnection meansfor at least four different cards.
- Accessto the database server may be achieved through a user computer 2, for example a personal computer or a mainframe, running a database software such as, for example, Oracle, SQLServer, DB2, MySQL, etc.
- the user computer 2 is used for entering data, storing queries, sending queriesto the database server 1 , retrieving and displaying the results over any suitable front-end, and various other database management tasks .
- Exchange of queries, commands, resultsand data between the user computer 2 and the database server 1 may be performed over an ODBC link or any other suitable high level database interface.
- the physical architecture of the database server 1 isthustransparent for the database server.
- Rg. 2 is a block diagram illustrating an example of electronic card 3 as used in the cage of Rg. 1.
- the electronic card of thisembodiment comprises one input/output node (I/O node) 4, for example a FPGA or a dedicated component, for interconnecting the electronic card with its neighborsover a link 40, for example a serial and/or Ethernet link, for exchanging ODBC commands, for example.
- the card further comprises a plurality of clusters ⁇ of processing nodes Each cluster isconnected over 7 KNOFES 1 -PCT
- the card comprises five clusters ⁇ of nodes arranged in a single row, each cluster being connected to a maximum of two neighbor clusters
- Other arrangements of clusters for example in a bi- dimensional array, may be considered.
- the clusters ⁇ may be made up of various electronic components, such as individual integrated circuits, on a circuit board.
- the circuit board is organized like a DIM M (dual in-line memory module) removably held in a DIM M socket on the card 3.
- DIM M dual in-line memory module
- Other kinds of circuit boards can be used within the f rame of the invention.
- FIG. 3 illustrates an example of cluster 5 comprising in this embodiment ten processing nodes 7 in two lines of five. Again, each processing node is interconnected over preferably serial linksto itsdirect neighbors in the North, South, West and East directions. Interconnection with other clusters can be made through connecting pins of the circuit board on which the cluster is mounted and through conducting paths of the electronic cards3.
- FIG. 4 illustrates an example of processing node 7.
- the processing node is implemented with two FPGA or ASIC chips 13, a RAM memory 1 1 and a non-volatile memory, for example Rash memory, 12.
- the processing node is implemented in Xilinx (Trademark) Virtex 4 components.
- Each FPGA component 13 is designed and programmed so asto logically correspond to a routing engine 8, several (four in this embodiment) SQL processing engines 9, and several (two in the example) storage management circuits 10.
- the routing engines ⁇ forward incoming data and commands or queries, aswell as processed results, to the relevant 8 KNOFES 1 -PCT
- the routing engine also features computing capabilities for processing the queries and the results routed to other nodes. For example, if a table is partitioned into several partsstored and processed by different processing nodes, each processing node may sort its partition and the routing engine may retrieve the pre-sorted f ragmentsf rom the different nodes, join them, and deliver an entirely sorted list to another node. Very fast algorithms for joining pre-sorted lists may be used if the routing engines use a content addressable memory (CAM) and a helper storage memory; after the initial data read, only data pointers are passed within the engine, which keeps it compact and fast.
- CAM content addressable memory
- helper storage memory after the initial data read, only data pointers are passed within the engine, which keeps it compact and fast.
- the 9QL processing engines 9 are designed and programmed so asto be able to answer 9QL queries, and possibly other database management commands, on the partitions of the dataset stored in their local memory space 11 , 12.
- 9QL processing engines 9 in order to keep the 9QL processing engines simple, fast and cost effective, only a subset of the 9QL commands and queries can be interpreted by the engines 9; other less f requent or I ess time-critical queries may be executed by other parts of the database server 1 , in the user computer 2 or by the user f ront-end application.
- the 9QL processing engines preferably reads data only f rom their local RAM 1 1. Extremely short read-access time isthus ensured. Updated tablesare written both in the RAM and in the local Rash memory, to ensure that no data is lost even in case of system failure.
- the Rash memory isthus accessed only in write mode during operation; read access is only required for restoring data after a failure or reboot.
- the memory horizon of each processing node is kept low, for example at 128 M bytes This ensures very fast select or sort operations on the small local partition data.
- the storage management circuits 10 are used for controlling accessto the Rash and, possibly, RAM memory.
- RAM 11 is preferably based on DDRII components, Rash on NAND Rash chips
- the two FPGAs 13 are interconnected mutually and with other processing nodesover North, South, East and West linkswhich, when coupled, build an almost uniform non-hierarchical network mesh, for example in a bi-dimensional array, in a cube, in an n-cube or any other suitable organization. Additional channels may be added in one or several directionsto increase throughput in preferred directions; moreover, links between nodes may be broken to insert user access nodes or other data peripheral devices.
- interconnect schemes including circular schemes, diagonal links, etc. may be used within the f rame of the invention.
- the global size of the Rash and RAM memory is250 GBytes; scalability to I OTBytesor more may be achieved with current components.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
La présente invention concerne un serveur de bases de données parallèles (1) comprenant une structure non hiérarchique de nœuds de traitement sans disque dur interconnectés (7), où au moins plusieurs desdits nœuds comprennent : un composant élément de traitement SQL (9) destiné à l’exécution de requêtes SQL, de la mémoire vive locale (11) associée auxdits éléments de traitement SQL, de la mémoire inscriptible non volatile (12) associée audit élément de traitement SQL pou le stockage non volatile et décentralisé de l’ensemble de données, où la totalité de l’ensemble de données est partitionné et stocké dans lesdits nœuds de traitement.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2005/054761 WO2007038981A1 (fr) | 2005-09-22 | 2005-09-22 | Serveur de bases de données parallèles et procédé de fonctionnement d’un serveur de bases de données |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2005/054761 WO2007038981A1 (fr) | 2005-09-22 | 2005-09-22 | Serveur de bases de données parallèles et procédé de fonctionnement d’un serveur de bases de données |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2007038981A1 true WO2007038981A1 (fr) | 2007-04-12 |
Family
ID=36061475
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2005/054761 WO2007038981A1 (fr) | 2005-09-22 | 2005-09-22 | Serveur de bases de données parallèles et procédé de fonctionnement d’un serveur de bases de données |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2007038981A1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012058265A1 (fr) * | 2010-10-26 | 2012-05-03 | ParElastic Corporation | Appareil de traitement élastique de base de données avec données hétérogènes |
-
2005
- 2005-09-22 WO PCT/EP2005/054761 patent/WO2007038981A1/fr active Application Filing
Non-Patent Citations (3)
Title |
---|
BARU C K ET AL: "DB2 PARALLEL EDITION", IBM SYSTEMS JOURNAL, IBM CORP. ARMONK, NEW YORK, US, vol. 34, no. 2, 21 March 1995 (1995-03-21), pages 292 - 322, XP000526619, ISSN: 0018-8670 * |
DEWITT D ET AL: "PARALLEL DATABASE SYSTEMS: THE FUTURE OF HIGH PERFORMANCE DATABASE SYSTEMS", COMMUNICATIONS OF THE ASSOCIATION FOR COMPUTING MACHINERY, ACM, NEW YORK, NY, US, vol. 35, no. 6, 1 June 1992 (1992-06-01), pages 85 - 98, XP000331759, ISSN: 0001-0782 * |
QIANG LI ET AL: "A TRANSPUTER T9000 FAMILY BASED ARCHITECTURE FOR PARALLEL DATABASE MACHINES", COMPUTER ARCHITECTURE NEWS, ACM, NEW YORK, NY, US, vol. 21, no. 5, 1 December 1993 (1993-12-01), pages 55 - 62, XP000432538, ISSN: 0163-5964 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012058265A1 (fr) * | 2010-10-26 | 2012-05-03 | ParElastic Corporation | Appareil de traitement élastique de base de données avec données hétérogènes |
US8214356B1 (en) | 2010-10-26 | 2012-07-03 | ParElastic Corporation | Apparatus for elastic database processing with heterogeneous data |
US8386532B2 (en) | 2010-10-26 | 2013-02-26 | ParElastic Corporation | Mechanism for co-located data placement in a parallel elastic database management system |
US8386473B2 (en) | 2010-10-26 | 2013-02-26 | ParElastic Corporation | Process architecture for elastic stateful shared nothing system |
US8478790B2 (en) | 2010-10-26 | 2013-07-02 | ParElastic Corporation | Mechanism for co-located data placement in a parallel elastic database management system |
US8943103B2 (en) | 2010-10-26 | 2015-01-27 | Tesora, Inc. | Improvements to query execution in a parallel elastic database management system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Bitton et al. | A taxonomy of parallel sorting | |
US20140280375A1 (en) | Systems and methods for implementing distributed databases using many-core processors | |
CN107710193B (zh) | 分布式计算环境的数据放置控制 | |
Băzăr et al. | The Transition from RDBMS to NoSQL. A Comparative Analysis of Three Popular Non-Relational Solutions: Cassandra, MongoDB and Couchbase. | |
US7899851B2 (en) | Indexing method of database management system | |
US8892558B2 (en) | Inserting data into an in-memory distributed nodal database | |
US20090113172A1 (en) | Network topology for a scalable multiprocessor system | |
US8812645B2 (en) | Query optimization in a parallel computer system with multiple networks | |
JP2004538548A (ja) | 新規の大量並列スーパーコンピュータ | |
JP2002544598A (ja) | 二次元線形スケーラブル・パラレル・アーキテクチャを有する検索エンジン | |
US10810206B2 (en) | Efficient multi-dimensional partitioning and sorting in large-scale distributed data processing systems | |
Phan et al. | Toward intersection filter-based optimization for joins in mapreduce | |
CN113672583A (zh) | 基于存储与计算分离的大数据多数据源分析方法及系统 | |
JP4511469B2 (ja) | 情報処理方法及び情報処理システム | |
WO2007038981A1 (fr) | Serveur de bases de données parallèles et procédé de fonctionnement d’un serveur de bases de données | |
US20090043728A1 (en) | Query Optimization in a Parallel Computer System to Reduce Network Traffic | |
JP4620593B2 (ja) | 情報処理システムおよび情報処理方法 | |
Vilaça et al. | On the expressiveness and trade-offs of large scale tuple stores | |
US8037184B2 (en) | Query governor with network monitoring in a parallel computer system | |
CN111190991A (zh) | 一种非结构化数据传输系统及交互方法 | |
US11853298B2 (en) | Data storage and data retrieval methods and devices | |
JP4559971B2 (ja) | 分散メモリ型情報処理システム | |
CN109376117B (zh) | 计算芯片及其操作方法 | |
Hua et al. | Semantic-aware data cube for cloud networks | |
Upchurch | A Migratory Near Memory Processing Architecture Applied to Big Data Problems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 05786970 Country of ref document: EP Kind code of ref document: A1 |