+

CN107329824A - A kind of model method of the Map Reduce Distributed Calculations based on .NET platforms - Google Patents

A kind of model method of the Map Reduce Distributed Calculations based on .NET platforms Download PDF

Info

Publication number
CN107329824A
CN107329824A CN201710423527.9A CN201710423527A CN107329824A CN 107329824 A CN107329824 A CN 107329824A CN 201710423527 A CN201710423527 A CN 201710423527A CN 107329824 A CN107329824 A CN 107329824A
Authority
CN
China
Prior art keywords
regloader
data
distributed computing
map
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710423527.9A
Other languages
Chinese (zh)
Inventor
杨忠明
徐红波
熊君丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Institute of Science and Technology
Original Assignee
Guangdong Institute of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Institute of Science and Technology filed Critical Guangdong Institute of Science and Technology
Priority to CN201710423527.9A priority Critical patent/CN107329824A/en
Publication of CN107329824A publication Critical patent/CN107329824A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Multi Processors (AREA)

Abstract

The NETRMR prototype systems that the present invention is provided draw Map Reduce model simplification Distributed Calculations under multiple programming complexity a little on the basis of, it is that the distributed computing architecture under .NET platforms devises a kind of algorithm using the distributed computing technology of .NET Remoting frameworks --- NETRMR.Specifically, comprising the following steps:Pass through Map Reduce interface input datas;Submit the data to ManagerConsole;Data are assigned to RegLoader by SolutionArray, and RegLoader starts Worker and calculated, and recovery is responsible for by Reducer;Obtain the RegLoader Active Registrations formation MachineList of data.The present invention encapsulates the block supports of System Fault Tolerance and load balance, simplifies the complexity of multiple programming under DCE so that the model has wide applicability.

Description

一种基于.NET平台的Map-Reduce分布式计算的模型方法A Model Method of Map-Reduce Distributed Computing Based on .NET Platform

技术领域technical field

本发明一般涉及基于计算机处理的分布式计算领域,具体涉及一种基于.NET平台的Map-Reduce分布式计算的模型方法。The present invention generally relates to the field of distributed computing based on computer processing, in particular to a model method of Map-Reduce distributed computing based on .NET platform.

背景技术Background technique

近半个世纪以来,信息量的爆炸性增长促使科学计算进入多核并行时代。在并行计算模型方面,Viliant提出了BSP模型,将计算划分为一个一个的超步,以硬件实现障碍同步的方式控制粗粒度级,但超级步的长度必须适应任意的h-relation,且全局障碍同步需要特殊硬件的支持。后来Leslie G在BSP模型的基础上提出了Multi-BSP模型,假设多个BSP模型中处理单元可通过点对点通信,有利于BSP模型的移植和分析,对于处理核和缓存的抽象仅局限于BSP模型的处理单元。Arge, L提出了PEM模型,以此分析几种通用基础并行算法的I/O复杂度,但并没有考虑缓存分层结构。然而单机的多核并行计算模型并不利于计算量的拓展,分布式计算更适合于超大规模的科学计算。For nearly half a century, the explosive growth of information has prompted scientific computing to enter the era of multi-core parallelism. In terms of parallel computing model, Viliant proposed the BSP model, which divides the calculation into supersteps one by one, and controls the coarse-grained level by hardware to realize obstacle synchronization, but the length of the superstep must adapt to any h-relation, and the global obstacle Synchronization requires special hardware support. Later, Leslie G proposed the Multi-BSP model based on the BSP model. It is assumed that the processing units in multiple BSP models can communicate through point-to-point, which is conducive to the transplantation and analysis of the BSP model. The abstraction of the processing core and cache is limited to the BSP model. processing unit. Arge, L proposed the PEM model to analyze the I/O complexity of several general-purpose basic parallel algorithms, but did not consider the cache hierarchy. However, the single-machine multi-core parallel computing model is not conducive to the expansion of the amount of calculation, and distributed computing is more suitable for ultra-large-scale scientific computing.

在分布式计算方面,Google提出了MapReduce模型,将计算问题抽象为Map和Reduce两个阶段的操作,但由于未能摆脱对系统软硬件环境的依赖,很多学者在此模型基础上进行优化创新。如Douglas等人在开源搜索引擎系统Nuth上创立的Hadoop并行计算框架,依赖HDFS将一个任务分解成映射与合并两种方式,通过映射进行简化,从而产生部分归并结果,然后对同类结果进行归并计算。也有人尝试设计更多并行计算框架,具有代表性的有Spark,以Scala语言实现,不再需要读写HDFS,同时在某些工作负载方面表现的更加优越。此外,Leslie Lamport提出的Lamport算法以及Paxos算法[8]也为分布式计算做出了贡献。In terms of distributed computing, Google proposed the MapReduce model, which abstracts computing problems into two-stage operations of Map and Reduce. However, due to the failure to get rid of the dependence on the system's hardware and software environment, many scholars have optimized and innovated on the basis of this model. For example, the Hadoop parallel computing framework created by Douglas et al. on the open source search engine system Nuth relies on HDFS to decompose a task into two methods: mapping and merging, and simplifies through mapping to generate partial merging results, and then merges and calculates similar results. . Some people are also trying to design more parallel computing frameworks. The representative one is Spark, which is implemented in Scala language, no longer needs to read and write HDFS, and at the same time, it performs better in certain workloads. In addition, the Lamport algorithm proposed by Leslie Lamport and the Paxos algorithm [8] also made contributions to distributed computing.

发明内容Contents of the invention

本发明针对目前优秀的分布式计算模型匮乏的情况,为.NET平台下的分布式计算架构设计了一种一种基于.NET平台的Map-Reduce分布式计算的模型方法——NETRMR,封装了系统容错和负载平衡的模块支撑,简化分布式计算环境下并行编程的复杂性。Aiming at the lack of excellent distributed computing models at present, the present invention designs a model method of Map-Reduce distributed computing based on the .NET platform for the distributed computing architecture under the .NET platform——NETRMR, which encapsulates Module support for system fault tolerance and load balancing simplifies the complexity of parallel programming in a distributed computing environment.

一种基于.NET平台的Map-Reduce分布式计算的模型方法具体技术方案如下:A specific technical scheme of a model method for Map-Reduce distributed computing based on the .NET platform is as follows:

(a)通过Map-Reduce接口输入数据;(a) input data through the Map-Reduce interface;

(b)将数据提交到ManagerConsole;(b) Submit data to ManagerConsole;

(c)SolutionArray将数据分派至RegLoader,RegLoader启动Worker进行计算,由Reducer负责回收;(c) SolutionArray assigns the data to RegLoader, RegLoader starts Worker for calculation, and Reducer is responsible for recycling;

(d)获得数据的RegLoader主动注册形成MachineList。(d) The RegLoader that obtains the data actively registers to form a MachineList.

上述方案中,步骤(c)包括以下步骤:In the above scheme, step (c) includes the following steps:

(c-1) SolutionArray将数据分派至各个RegLoader中,由RegLoader对数据进行分配处理;(c-1) SolutionArray distributes the data to each RegLoader, and the RegLoader distributes the data;

(c-2)各个RegLoader启动若干个Worker进行计算;(c-2) Each RegLoader starts several Workers for calculation;

(c-3)Reducer负责整理、归纳、回收由Worker计算后产生的结果。(c-3) The Reducer is responsible for sorting, summarizing, and recycling the results calculated by the Worker.

与现有技术相比,本发明具有如下优点和技术效果:Compared with the prior art, the present invention has the following advantages and technical effects:

本发明封装了系统容错和负载平衡的模块支撑,简化分布式计算环境下并行编程的复杂性,使得该模型具有较广泛的适用性。The invention encapsulates the module support of system fault tolerance and load balance, simplifies the complexity of parallel programming under the distributed computing environment, and makes the model have wider applicability.

附图说明Description of drawings

图1为实施方式中一种基于.NET平台的Map-Reduce分布式计算的模型方法流程图。Fig. 1 is a flow chart of a model method of Map-Reduce distributed computing based on the .NET platform in an embodiment.

具体实施方式detailed description

以下结合附图对本发明的实施方式作进一步说明,但本发明的实施不限于此。The embodiments of the present invention will be further described below in conjunction with the accompanying drawings, but the implementation of the present invention is not limited thereto.

一种基于.NET平台的Map-Reduce分布式计算的模型方法,如图1所示,主要包括以下步骤:A model method of Map-Reduce distributed computing based on the .NET platform, as shown in Figure 1, mainly includes the following steps:

(a)通过Map-Reduce接口输入数据;(a) input data through the Map-Reduce interface;

(b)将数据提交到ManagerConsole;(b) Submit data to ManagerConsole;

(c)SolutionArray将数据分派至RegLoader,RegLoader启动Worker进行计算,由Reducer负责回收;其具体步骤为:(c) SolutionArray dispatches data to RegLoader, RegLoader starts Worker for calculation, and Reducer is responsible for recycling; the specific steps are:

(c-1) SolutionArray将数据分派至各个RegLoader中,由RegLoader对数据进行分配处理。(c-1) SolutionArray distributes the data to each RegLoader, and the RegLoader distributes the data.

(c-2)各个RegLoader启动若干个Worker进行计算;(c-2) Each RegLoader starts several Workers for calculation;

(c-3)Reducer负责整理、归纳、回收由Worker计算后产生的结果;(c-3) Reducer is responsible for sorting, summarizing, and recycling the results calculated by Worker;

(d)获得数据的RegLoader主动注册形成MachineList。(d) The RegLoader that obtains the data actively registers to form a MachineList.

Claims (2)

1.一种基于.NET平台的Map-Reduce分布式计算的模型方法,其特征在于,包括以下步骤:1. a model method based on the Map-Reduce distributed computing of .NET platform, is characterized in that, may further comprise the steps: (a)通过Map-Reduce接口输入数据;(a) input data through the Map-Reduce interface; (b)将数据提交到ManagerConsole;(b) Submit data to ManagerConsole; (c)SolutionArray将数据分派至RegLoader,RegLoader启动Worker进行计算,由Reducer负责回收;(c) SolutionArray assigns the data to RegLoader, RegLoader starts Worker for calculation, and Reducer is responsible for recycling; (d)获得数据的RegLoader主动注册形成MachineList。(d) The RegLoader that obtains the data actively registers to form a MachineList. 2.根据权利要求1所描述的一种基于.NET平台的Map-Reduce分布式计算的模型方法,其特征在于,步骤(c)包括以下步骤:2. A model method based on the Map-Reduce distributed computing of the .NET platform described in claim 1, wherein step (c) comprises the following steps: (c-1) SolutionArray将数据分派至各个RegLoader中,由RegLoader对数据进行分配处理;(c-1) SolutionArray distributes the data to each RegLoader, and the RegLoader distributes the data; (c-2)各个RegLoader启动若干个Worker进行计算;(c-2) Each RegLoader starts several Workers for calculation; (c-3)Reducer负责整理、归纳、回收由Worker计算后产生的结果。(c-3) The Reducer is responsible for sorting, summarizing, and recycling the results calculated by the Worker.
CN201710423527.9A 2017-06-07 2017-06-07 A kind of model method of the Map Reduce Distributed Calculations based on .NET platforms Pending CN107329824A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710423527.9A CN107329824A (en) 2017-06-07 2017-06-07 A kind of model method of the Map Reduce Distributed Calculations based on .NET platforms

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710423527.9A CN107329824A (en) 2017-06-07 2017-06-07 A kind of model method of the Map Reduce Distributed Calculations based on .NET platforms

Publications (1)

Publication Number Publication Date
CN107329824A true CN107329824A (en) 2017-11-07

Family

ID=60194546

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710423527.9A Pending CN107329824A (en) 2017-06-07 2017-06-07 A kind of model method of the Map Reduce Distributed Calculations based on .NET platforms

Country Status (1)

Country Link
CN (1) CN107329824A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9471390B2 (en) * 2013-01-16 2016-10-18 International Business Machines Corporation Scheduling mapreduce jobs in a cluster of dynamically available servers
CN106648891A (en) * 2016-12-09 2017-05-10 中国联合网络通信集团有限公司 MapReduce model-based task execution method and apparatus
CN106778079A (en) * 2016-11-22 2017-05-31 重庆邮电大学 A kind of DNA sequence dna k mer frequency statistics methods based on MapReduce

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9471390B2 (en) * 2013-01-16 2016-10-18 International Business Machines Corporation Scheduling mapreduce jobs in a cluster of dynamically available servers
CN106778079A (en) * 2016-11-22 2017-05-31 重庆邮电大学 A kind of DNA sequence dna k mer frequency statistics methods based on MapReduce
CN106648891A (en) * 2016-12-09 2017-05-10 中国联合网络通信集团有限公司 MapReduce model-based task execution method and apparatus

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
程国建: "《迁移到云端在云计算的新世界开发应用》", 30 June 2015, 国防工业出版社 *

Similar Documents

Publication Publication Date Title
CN107329828B (en) A kind of data flow programmed method and system towards CPU/GPU isomeric group
Zhang et al. A comparison of distributed machine learning platforms
CN112035995B (en) Unstructured grid tidal current numerical simulation method based on GPU computing technology
Yin et al. A comparative survey of big data computing and HPC: From a parallel programming model to a cluster architecture
CN103699656A (en) GPU-based mass-multimedia-data-oriented MapReduce platform
Aji et al. Haggis: turbocharge a MapReduce based spatial data warehousing system with GPU engine
Tudoran et al. Mapiterativereduce: a framework for reduction-intensive data processing on azure clouds
CN102591709A (en) Shapefile master-slave type parallel writing method based on OGR (open geospatial rule)
Zhang et al. Efficient disk-based directed graph processing: A strongly connected component approach
CN107657131A (en) Fluid interactive simulation method and system based on GPUs (general purpose computing) clusters
Ghasemi et al. Accelerating apache spark big data analysis with fpgas
JP2024504523A (en) Speaker-adaptive speech end detection for conversational AI applications
Yoginath et al. Scalable cloning on large-scale gpu platforms with application to time-stepped simulations on grids
Ying et al. RETRACTED ARTICLE: Towards fault tolerance optimization based on checkpoints of in-memory framework spark
Heidari et al. CAMDNN: Content-aware mapping of a network of deep neural networks on edge MPSoCs
Goncalves et al. Data analytics in the cloud with flexible MapReduce workflows
CN105302551B (en) A kind of method and system of the Orthogonal Decomposition construction and optimization of big data processing system
CN107301094A (en) The dynamic self-adapting data model inquired about towards extensive dynamic transaction
Fan et al. Model aggregation method for data parallelism in distributed real-time machine learning of smart sensing equipment
CN107329824A (en) A kind of model method of the Map Reduce Distributed Calculations based on .NET platforms
Yang et al. Parameter communication consistency model for large-scale security monitoring based on mobile computing
Machado et al. On the scalability of constraint programming on hierarchical multiprocessor systems
Bader Evolving mpi+ x toward exascale
Diez Dolinski et al. Distributed simulation of P systems by means of map-reduce: first steps with Hadoop and P-Lingua
Kim et al. Improving performance of real-time object detection in edge device through concurrent multi-frame processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20171107

WD01 Invention patent application deemed withdrawn after publication
点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载