CN107329824A

CN107329824A - A kind of model method of the Map Reduce Distributed Calculations based on .NET platforms

Info

Publication number: CN107329824A
Application number: CN201710423527.9A
Authority: CN
Inventors: 杨忠明; 徐红波; 熊君丽
Original assignee: Guangdong Institute of Science and Technology
Current assignee: Guangdong Institute of Science and Technology
Priority date: 2017-06-07
Filing date: 2017-06-07
Publication date: 2017-11-07

Abstract

The NETRMR prototype systems that the present invention is provided draw Map Reduce model simplification Distributed Calculations under multiple programming complexity a little on the basis of, it is that the distributed computing architecture under .NET platforms devises a kind of algorithm using the distributed computing technology of .NET Remoting frameworks --- NETRMR.Specifically, comprising the following steps：Pass through Map Reduce interface input datas;Submit the data to ManagerConsole；Data are assigned to RegLoader by SolutionArray, and RegLoader starts Worker and calculated, and recovery is responsible for by Reducer；Obtain the RegLoader Active Registrations formation MachineList of data.The present invention encapsulates the block supports of System Fault Tolerance and load balance, simplifies the complexity of multiple programming under DCE so that the model has wide applicability.

Description

A Model Method of Map-Reduce Distributed Computing Based on .NET Platform

技术领域technical field

本发明一般涉及基于计算机处理的分布式计算领域，具体涉及一种基于.NET平台的Map-Reduce分布式计算的模型方法。The present invention generally relates to the field of distributed computing based on computer processing, in particular to a model method of Map-Reduce distributed computing based on .NET platform.

背景技术Background technique

近半个世纪以来，信息量的爆炸性增长促使科学计算进入多核并行时代。在并行计算模型方面，Viliant提出了BSP模型，将计算划分为一个一个的超步，以硬件实现障碍同步的方式控制粗粒度级，但超级步的长度必须适应任意的h-relation，且全局障碍同步需要特殊硬件的支持。后来Leslie G在BSP模型的基础上提出了Multi-BSP模型，假设多个BSP模型中处理单元可通过点对点通信，有利于BSP模型的移植和分析，对于处理核和缓存的抽象仅局限于BSP模型的处理单元。Arge, L提出了PEM模型，以此分析几种通用基础并行算法的I/O复杂度，但并没有考虑缓存分层结构。然而单机的多核并行计算模型并不利于计算量的拓展，分布式计算更适合于超大规模的科学计算。For nearly half a century, the explosive growth of information has prompted scientific computing to enter the era of multi-core parallelism. In terms of parallel computing model, Viliant proposed the BSP model, which divides the calculation into supersteps one by one, and controls the coarse-grained level by hardware to realize obstacle synchronization, but the length of the superstep must adapt to any h-relation, and the global obstacle Synchronization requires special hardware support. Later, Leslie G proposed the Multi-BSP model based on the BSP model. It is assumed that the processing units in multiple BSP models can communicate through point-to-point, which is conducive to the transplantation and analysis of the BSP model. The abstraction of the processing core and cache is limited to the BSP model. processing unit. Arge, L proposed the PEM model to analyze the I/O complexity of several general-purpose basic parallel algorithms, but did not consider the cache hierarchy. However, the single-machine multi-core parallel computing model is not conducive to the expansion of the amount of calculation, and distributed computing is more suitable for ultra-large-scale scientific computing.

在分布式计算方面，Google提出了MapReduce模型，将计算问题抽象为Map和Reduce两个阶段的操作，但由于未能摆脱对系统软硬件环境的依赖，很多学者在此模型基础上进行优化创新。如Douglas等人在开源搜索引擎系统Nuth上创立的Hadoop并行计算框架，依赖HDFS将一个任务分解成映射与合并两种方式，通过映射进行简化，从而产生部分归并结果，然后对同类结果进行归并计算。也有人尝试设计更多并行计算框架，具有代表性的有Spark，以Scala语言实现，不再需要读写HDFS，同时在某些工作负载方面表现的更加优越。此外，Leslie Lamport提出的Lamport算法以及Paxos算法[8]也为分布式计算做出了贡献。In terms of distributed computing, Google proposed the MapReduce model, which abstracts computing problems into two-stage operations of Map and Reduce. However, due to the failure to get rid of the dependence on the system's hardware and software environment, many scholars have optimized and innovated on the basis of this model. For example, the Hadoop parallel computing framework created by Douglas et al. on the open source search engine system Nuth relies on HDFS to decompose a task into two methods: mapping and merging, and simplifies through mapping to generate partial merging results, and then merges and calculates similar results. . Some people are also trying to design more parallel computing frameworks. The representative one is Spark, which is implemented in Scala language, no longer needs to read and write HDFS, and at the same time, it performs better in certain workloads. In addition, the Lamport algorithm proposed by Leslie Lamport and the Paxos algorithm [8] also made contributions to distributed computing.

发明内容Contents of the invention

本发明针对目前优秀的分布式计算模型匮乏的情况，为.NET平台下的分布式计算架构设计了一种一种基于.NET平台的Map-Reduce分布式计算的模型方法——NETRMR，封装了系统容错和负载平衡的模块支撑，简化分布式计算环境下并行编程的复杂性。Aiming at the lack of excellent distributed computing models at present, the present invention designs a model method of Map-Reduce distributed computing based on the .NET platform for the distributed computing architecture under the .NET platform——NETRMR, which encapsulates Module support for system fault tolerance and load balancing simplifies the complexity of parallel programming in a distributed computing environment.

一种基于.NET平台的Map-Reduce分布式计算的模型方法具体技术方案如下：A specific technical scheme of a model method for Map-Reduce distributed computing based on the .NET platform is as follows:

（a）通过Map-Reduce接口输入数据;(a) input data through the Map-Reduce interface;

（b）将数据提交到ManagerConsole；(b) Submit data to ManagerConsole;

（c）SolutionArray将数据分派至RegLoader，RegLoader启动Worker进行计算，由Reducer负责回收；(c) SolutionArray assigns the data to RegLoader, RegLoader starts Worker for calculation, and Reducer is responsible for recycling;

（d）获得数据的RegLoader主动注册形成MachineList。(d) The RegLoader that obtains the data actively registers to form a MachineList.

上述方案中，步骤（c）包括以下步骤：In the above scheme, step (c) includes the following steps:

(c-1) SolutionArray将数据分派至各个RegLoader中，由RegLoader对数据进行分配处理；(c-1) SolutionArray distributes the data to each RegLoader, and the RegLoader distributes the data;

（c-2）各个RegLoader启动若干个Worker进行计算；(c-2) Each RegLoader starts several Workers for calculation;

（c-3）Reducer负责整理、归纳、回收由Worker计算后产生的结果。(c-3) The Reducer is responsible for sorting, summarizing, and recycling the results calculated by the Worker.

与现有技术相比，本发明具有如下优点和技术效果：Compared with the prior art, the present invention has the following advantages and technical effects:

本发明封装了系统容错和负载平衡的模块支撑，简化分布式计算环境下并行编程的复杂性，使得该模型具有较广泛的适用性。The invention encapsulates the module support of system fault tolerance and load balance, simplifies the complexity of parallel programming under the distributed computing environment, and makes the model have wider applicability.

附图说明Description of drawings

图1为实施方式中一种基于.NET平台的Map-Reduce分布式计算的模型方法流程图。Fig. 1 is a flow chart of a model method of Map-Reduce distributed computing based on the .NET platform in an embodiment.

具体实施方式detailed description

以下结合附图对本发明的实施方式作进一步说明，但本发明的实施不限于此。The embodiments of the present invention will be further described below in conjunction with the accompanying drawings, but the implementation of the present invention is not limited thereto.

一种基于.NET平台的Map-Reduce分布式计算的模型方法，如图1所示，主要包括以下步骤：A model method of Map-Reduce distributed computing based on the .NET platform, as shown in Figure 1, mainly includes the following steps:

（b）将数据提交到ManagerConsole；(b) Submit data to ManagerConsole;

（c）SolutionArray将数据分派至RegLoader，RegLoader启动Worker进行计算，由Reducer负责回收；其具体步骤为：(c) SolutionArray dispatches data to RegLoader, RegLoader starts Worker for calculation, and Reducer is responsible for recycling; the specific steps are:

(c-1) SolutionArray将数据分派至各个RegLoader中，由RegLoader对数据进行分配处理。(c-1) SolutionArray distributes the data to each RegLoader, and the RegLoader distributes the data.

（c-3）Reducer负责整理、归纳、回收由Worker计算后产生的结果；(c-3) Reducer is responsible for sorting, summarizing, and recycling the results calculated by Worker;

Claims

1. a model method based on the Map-Reduce distributed computing of .NET platform, is characterized in that, may further comprise the steps:

(a) input data through the Map-Reduce interface;

(b) Submit data to ManagerConsole;

(c) SolutionArray assigns the data to RegLoader, RegLoader starts Worker for calculation, and Reducer is responsible for recycling;

(d) The RegLoader that obtains the data actively registers to form a MachineList.

2. A model method based on the Map-Reduce distributed computing of the .NET platform described in claim 1, wherein step (c) comprises the following steps:

(c-1) SolutionArray distributes the data to each RegLoader, and the RegLoader distributes the data;

(c-2) Each RegLoader starts several Workers for calculation;

(c-3) The Reducer is responsible for sorting, summarizing, and recycling the results calculated by the Worker.