+

WO2007038981A1 - Serveur de bases de données parallèles et procédé de fonctionnement d’un serveur de bases de données - Google Patents

Serveur de bases de données parallèles et procédé de fonctionnement d’un serveur de bases de données Download PDF

Info

Publication number
WO2007038981A1
WO2007038981A1 PCT/EP2005/054761 EP2005054761W WO2007038981A1 WO 2007038981 A1 WO2007038981 A1 WO 2007038981A1 EP 2005054761 W EP2005054761 W EP 2005054761W WO 2007038981 A1 WO2007038981 A1 WO 2007038981A1
Authority
WO
WIPO (PCT)
Prior art keywords
database server
processing
processing nodes
dataset
nodes
Prior art date
Application number
PCT/EP2005/054761
Other languages
English (en)
Inventor
Mike Stengle
Original Assignee
Knowledge Resources (Switzerland) Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Knowledge Resources (Switzerland) Gmbh filed Critical Knowledge Resources (Switzerland) Gmbh
Priority to PCT/EP2005/054761 priority Critical patent/WO2007038981A1/fr
Publication of WO2007038981A1 publication Critical patent/WO2007038981A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries

Definitions

  • the present invention concerns a database server, in particular a high performance, massively parallel database server, and a method for operating such a database server.
  • US-B1 -6'463'433 describes a database system comprising one or more front end computers and one or more computer nodes interconnected by a network.
  • a query from a user istransmitted from one of the front end computersto one of the computer nodes, called the home node, which extracts features from the query and sends it to other computer nodes
  • Each participating computer node then performs a search on its respective partition of the database.
  • the structure of the database is thus hierarchical; the home nodes decide which computer node should participate in a query. Performance thusdependson a good repartition of the overall processing power between the home nodes and the other nodes.
  • each node is made up of a complete computer with its 2 KNOFES 1 -PCT
  • U9-B1 -5'857'180 describes a method and apparatusfor parallel processing in a database environment.
  • the method providesthe ability to dynamically partition row sources and for directing row sourcesto one or more query slaves
  • the organization isthus hierarchical; a query coordinator controlsthe processing of a query.
  • the organization described in thisdocument is a purely software-based, and does not rely on any physical partitioning of the query slaves or memory partitions. It does not necessarily make an efficient use of the underlying hardware organization. If the method is used in a software-implemented shared disk system, performance strongly depends on the bandwidth of the communication busto the disk.
  • USFE-38'410 describes an apparatusfor parallel data storage and processing.
  • the apparatus is optimized for fast processing of pixmap images; applicationsto databases are not suggested. It relies on an array of processing nodes, each node comprising one processor and at least one disk. Performance is thus limited by the access time of the disks and by the speed of the general purpose processors in the nodes.
  • U9-A1 -2003/0018630 describes a data storage and retrieval device using a systolic array of FPGA for performing searches over data in a shared disk. Although use of dedicated FPGA strongly improvesthe processing power of each node, the throughput still relies on the accesstime and bandwidth of the shared disk.
  • these aims are achieved by means of a non-hierarchical structure of interconnected nodes.
  • a non-hierarchical structure hasthe advantage, among others, of being less dependant on the processing power of the higher level nodes, and of being more f lexible and better adapted to various tasks and database organizations.
  • each node can act as a master for some tasks, a higher redundancy is achieved, thus making the whole system less sensitive to malfunctioning of one single node.
  • the non-hierarchical structure also allowsfor a better scalability; processing power and memory space can be increased very easily by adding processing nodes. Different nodes in input/output capacity will be accommodated by varying the ratio of input/output nodes (I/O nodes) in the structure. Thus, fewer I/O nodes will be required for applicationssuch asdata mining than for media streaming, video on demand or OLTP applications, for example.
  • the nodes are diskless.
  • each node has its own local Rash memory for non-volatile or permanent storage of a partition of the entire dataset.
  • the use of f lash memory, instead of hard disks, f urther reducesthe volume, footprint and power consumption of the server.
  • Even the non-volatile storage of the database content isthus decentralized.
  • a local RAM is also associated with each processing node, for short-term storage of the partition of the memory on which the node performs computations.
  • the data resides totally in the decentralized RAM during operation, and in the decentralized Rash memory for non-volatile data storage.
  • Each processing node thus has a very fast, not shared accessto its local permanent and temporary memory space.
  • the f lash memory is always in write mode, and used as a destination for the write through processduring data updates and to ensure compliance with data retention tests in case of system failure. 4 KNOFES 1 -PCT
  • the access latency can be reduced from several milliseconds in conventional systemsto a few hundreds nanoseconds, or even to a few tensof nanosecondsif the read access are entirely RAM based, the f lash memories being only used asa backup.
  • the processing elements in each node are made up of FPGA components programmed and optimized for executing SQL commands on the local dataset.
  • FPGA components programmed and optimized for executing SQL commands on the local dataset.
  • dedicated components for processing 9QL commands instead of generic processors, a further increase in computational power can be achieved.
  • a dedicated ASIC is used instead of FPGA components.
  • each computing node has a very short memory horizon, for example 128 Mbytes only, and can perform extremely fast operationson the partition of the dataset in thissmall memory space. For example, search queries, such as " select from Where" SQL queries, can be performed extremely fast, without any need to index the tables, thus reducing the load time.
  • Processing nodes are preferably mutually connected through a bi-dimensional, three-dimensional or n-dimensional mesh of high speed links, preferably serial links.
  • the processing nodes are all of equal importance asthere is no hierarchy in the network structure; each node is part of the query and result-routing infrastructure.
  • the processing nodes preferably each comprise at least one routing engine to forward data and commandsto the relevant nodes within the structure.
  • this routing engine is also made up of FPGA components, or even of the same FPGA chip asthe processing element in the node.
  • the processing nodesfurther comprise 5 KNOFES 1 -PCT
  • processing means for joining processing results computed by different processing nodes are joined by the routing engines having dedicated computing capabilities. For example, during the routing of the result f ragments of a sorted list, the routing engine will join the pre-sorted list f ragments retrieved f rom the individual nodes into a f ully sorted list, and deliversthe consolidated, sorted result to the input/output node.
  • Each processing node may further comprise a storage management circuit for managing storage and retrieval of data f rom and into said local RAM and local Rash memory.
  • the storage management circuit is also made up of FPGA components, preferably sharing a FPGA chip with the other FPGA components in the node.
  • Rg. 1 is a block diagram illustrating an example of database server comprising 5 electronic cards.
  • Rg. 2 is a block diagram illustrating an example of electronic card comprising one input/output node and 5 clusters
  • Rg. 3 is a block diagram illustrating an example of cluster comprising 10 processing nodes
  • Rg. 4 is a block diagram illustrating an example of processing node. 6 KNOFES 1 -PCT
  • Rg. 1 is a block diagram illustrating an example of database server 1 comprising five electronic cards 3 in a cage.
  • the electronic cards 3 may comprise connectors on one, two, three or all four sidesfor interconnectionsof the cards
  • the cards are mounted on a backplane having a connecting busfor mounting and interconnecting any number of cardsfrom 1 to n.
  • the cards are mounted in a bi-dimensional array, each card being connected over serial linkswith all itsdirect neighbors.
  • the five cards are connected in a single row, and thuseach card is connected to a maximum of two other cards. It would be possible however to interconnect the cards in several rows and columns, thus having cards connected to three or four other cards All cards preferably feature interconnection meansfor at least four different cards.
  • Accessto the database server may be achieved through a user computer 2, for example a personal computer or a mainframe, running a database software such as, for example, Oracle, SQLServer, DB2, MySQL, etc.
  • the user computer 2 is used for entering data, storing queries, sending queriesto the database server 1 , retrieving and displaying the results over any suitable front-end, and various other database management tasks .
  • Exchange of queries, commands, resultsand data between the user computer 2 and the database server 1 may be performed over an ODBC link or any other suitable high level database interface.
  • the physical architecture of the database server 1 isthustransparent for the database server.
  • Rg. 2 is a block diagram illustrating an example of electronic card 3 as used in the cage of Rg. 1.
  • the electronic card of thisembodiment comprises one input/output node (I/O node) 4, for example a FPGA or a dedicated component, for interconnecting the electronic card with its neighborsover a link 40, for example a serial and/or Ethernet link, for exchanging ODBC commands, for example.
  • the card further comprises a plurality of clusters ⁇ of processing nodes Each cluster isconnected over 7 KNOFES 1 -PCT
  • the card comprises five clusters ⁇ of nodes arranged in a single row, each cluster being connected to a maximum of two neighbor clusters
  • Other arrangements of clusters for example in a bi- dimensional array, may be considered.
  • the clusters ⁇ may be made up of various electronic components, such as individual integrated circuits, on a circuit board.
  • the circuit board is organized like a DIM M (dual in-line memory module) removably held in a DIM M socket on the card 3.
  • DIM M dual in-line memory module
  • Other kinds of circuit boards can be used within the f rame of the invention.
  • FIG. 3 illustrates an example of cluster 5 comprising in this embodiment ten processing nodes 7 in two lines of five. Again, each processing node is interconnected over preferably serial linksto itsdirect neighbors in the North, South, West and East directions. Interconnection with other clusters can be made through connecting pins of the circuit board on which the cluster is mounted and through conducting paths of the electronic cards3.
  • FIG. 4 illustrates an example of processing node 7.
  • the processing node is implemented with two FPGA or ASIC chips 13, a RAM memory 1 1 and a non-volatile memory, for example Rash memory, 12.
  • the processing node is implemented in Xilinx (Trademark) Virtex 4 components.
  • Each FPGA component 13 is designed and programmed so asto logically correspond to a routing engine 8, several (four in this embodiment) SQL processing engines 9, and several (two in the example) storage management circuits 10.
  • the routing engines ⁇ forward incoming data and commands or queries, aswell as processed results, to the relevant 8 KNOFES 1 -PCT
  • the routing engine also features computing capabilities for processing the queries and the results routed to other nodes. For example, if a table is partitioned into several partsstored and processed by different processing nodes, each processing node may sort its partition and the routing engine may retrieve the pre-sorted f ragmentsf rom the different nodes, join them, and deliver an entirely sorted list to another node. Very fast algorithms for joining pre-sorted lists may be used if the routing engines use a content addressable memory (CAM) and a helper storage memory; after the initial data read, only data pointers are passed within the engine, which keeps it compact and fast.
  • CAM content addressable memory
  • helper storage memory after the initial data read, only data pointers are passed within the engine, which keeps it compact and fast.
  • the 9QL processing engines 9 are designed and programmed so asto be able to answer 9QL queries, and possibly other database management commands, on the partitions of the dataset stored in their local memory space 11 , 12.
  • 9QL processing engines 9 in order to keep the 9QL processing engines simple, fast and cost effective, only a subset of the 9QL commands and queries can be interpreted by the engines 9; other less f requent or I ess time-critical queries may be executed by other parts of the database server 1 , in the user computer 2 or by the user f ront-end application.
  • the 9QL processing engines preferably reads data only f rom their local RAM 1 1. Extremely short read-access time isthus ensured. Updated tablesare written both in the RAM and in the local Rash memory, to ensure that no data is lost even in case of system failure.
  • the Rash memory isthus accessed only in write mode during operation; read access is only required for restoring data after a failure or reboot.
  • the memory horizon of each processing node is kept low, for example at 128 M bytes This ensures very fast select or sort operations on the small local partition data.
  • the storage management circuits 10 are used for controlling accessto the Rash and, possibly, RAM memory.
  • RAM 11 is preferably based on DDRII components, Rash on NAND Rash chips
  • the two FPGAs 13 are interconnected mutually and with other processing nodesover North, South, East and West linkswhich, when coupled, build an almost uniform non-hierarchical network mesh, for example in a bi-dimensional array, in a cube, in an n-cube or any other suitable organization. Additional channels may be added in one or several directionsto increase throughput in preferred directions; moreover, links between nodes may be broken to insert user access nodes or other data peripheral devices.
  • interconnect schemes including circular schemes, diagonal links, etc. may be used within the f rame of the invention.
  • the global size of the Rash and RAM memory is250 GBytes; scalability to I OTBytesor more may be achieved with current components.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention concerne un serveur de bases de données parallèles (1) comprenant une structure non hiérarchique de nœuds de traitement sans disque dur interconnectés (7), où au moins plusieurs desdits nœuds comprennent : un composant élément de traitement SQL (9) destiné à l’exécution de requêtes SQL, de la mémoire vive locale (11) associée auxdits éléments de traitement SQL, de la mémoire inscriptible non volatile (12) associée audit élément de traitement SQL pou le stockage non volatile et décentralisé de l’ensemble de données, où la totalité de l’ensemble de données est partitionné et stocké dans lesdits nœuds de traitement.
PCT/EP2005/054761 2005-09-22 2005-09-22 Serveur de bases de données parallèles et procédé de fonctionnement d’un serveur de bases de données WO2007038981A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2005/054761 WO2007038981A1 (fr) 2005-09-22 2005-09-22 Serveur de bases de données parallèles et procédé de fonctionnement d’un serveur de bases de données

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2005/054761 WO2007038981A1 (fr) 2005-09-22 2005-09-22 Serveur de bases de données parallèles et procédé de fonctionnement d’un serveur de bases de données

Publications (1)

Publication Number Publication Date
WO2007038981A1 true WO2007038981A1 (fr) 2007-04-12

Family

ID=36061475

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2005/054761 WO2007038981A1 (fr) 2005-09-22 2005-09-22 Serveur de bases de données parallèles et procédé de fonctionnement d’un serveur de bases de données

Country Status (1)

Country Link
WO (1) WO2007038981A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012058265A1 (fr) * 2010-10-26 2012-05-03 ParElastic Corporation Appareil de traitement élastique de base de données avec données hétérogènes

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BARU C K ET AL: "DB2 PARALLEL EDITION", IBM SYSTEMS JOURNAL, IBM CORP. ARMONK, NEW YORK, US, vol. 34, no. 2, 21 March 1995 (1995-03-21), pages 292 - 322, XP000526619, ISSN: 0018-8670 *
DEWITT D ET AL: "PARALLEL DATABASE SYSTEMS: THE FUTURE OF HIGH PERFORMANCE DATABASE SYSTEMS", COMMUNICATIONS OF THE ASSOCIATION FOR COMPUTING MACHINERY, ACM, NEW YORK, NY, US, vol. 35, no. 6, 1 June 1992 (1992-06-01), pages 85 - 98, XP000331759, ISSN: 0001-0782 *
QIANG LI ET AL: "A TRANSPUTER T9000 FAMILY BASED ARCHITECTURE FOR PARALLEL DATABASE MACHINES", COMPUTER ARCHITECTURE NEWS, ACM, NEW YORK, NY, US, vol. 21, no. 5, 1 December 1993 (1993-12-01), pages 55 - 62, XP000432538, ISSN: 0163-5964 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012058265A1 (fr) * 2010-10-26 2012-05-03 ParElastic Corporation Appareil de traitement élastique de base de données avec données hétérogènes
US8214356B1 (en) 2010-10-26 2012-07-03 ParElastic Corporation Apparatus for elastic database processing with heterogeneous data
US8386532B2 (en) 2010-10-26 2013-02-26 ParElastic Corporation Mechanism for co-located data placement in a parallel elastic database management system
US8386473B2 (en) 2010-10-26 2013-02-26 ParElastic Corporation Process architecture for elastic stateful shared nothing system
US8478790B2 (en) 2010-10-26 2013-07-02 ParElastic Corporation Mechanism for co-located data placement in a parallel elastic database management system
US8943103B2 (en) 2010-10-26 2015-01-27 Tesora, Inc. Improvements to query execution in a parallel elastic database management system

Similar Documents

Publication Publication Date Title
Bitton et al. A taxonomy of parallel sorting
US20140280375A1 (en) Systems and methods for implementing distributed databases using many-core processors
CN107710193B (zh) 分布式计算环境的数据放置控制
Băzăr et al. The Transition from RDBMS to NoSQL. A Comparative Analysis of Three Popular Non-Relational Solutions: Cassandra, MongoDB and Couchbase.
US7899851B2 (en) Indexing method of database management system
US8892558B2 (en) Inserting data into an in-memory distributed nodal database
US20090113172A1 (en) Network topology for a scalable multiprocessor system
US8812645B2 (en) Query optimization in a parallel computer system with multiple networks
JP2004538548A (ja) 新規の大量並列スーパーコンピュータ
JP2002544598A (ja) 二次元線形スケーラブル・パラレル・アーキテクチャを有する検索エンジン
US10810206B2 (en) Efficient multi-dimensional partitioning and sorting in large-scale distributed data processing systems
Phan et al. Toward intersection filter-based optimization for joins in mapreduce
CN113672583A (zh) 基于存储与计算分离的大数据多数据源分析方法及系统
JP4511469B2 (ja) 情報処理方法及び情報処理システム
WO2007038981A1 (fr) Serveur de bases de données parallèles et procédé de fonctionnement d’un serveur de bases de données
US20090043728A1 (en) Query Optimization in a Parallel Computer System to Reduce Network Traffic
JP4620593B2 (ja) 情報処理システムおよび情報処理方法
Vilaça et al. On the expressiveness and trade-offs of large scale tuple stores
US8037184B2 (en) Query governor with network monitoring in a parallel computer system
CN111190991A (zh) 一种非结构化数据传输系统及交互方法
US11853298B2 (en) Data storage and data retrieval methods and devices
JP4559971B2 (ja) 分散メモリ型情報処理システム
CN109376117B (zh) 计算芯片及其操作方法
Hua et al. Semantic-aware data cube for cloud networks
Upchurch A Migratory Near Memory Processing Architecture Applied to Big Data Problems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 05786970

Country of ref document: EP

Kind code of ref document: A1

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载