No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionInsight HD 6.5.0 Product Description 02

Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Mapreduce

Mapreduce

Basic Concept

Overview

As a programming model that enables simplified and concurrent computing, MapReduce involves two key operations: Map and Reduce. Map divides one job into multiple tasks, and Reduce summarizes the process results of these tasks and produces the final analysis result.

Architecture

In the YARN architecture, the MapReduce is integrated into the YARN through the Client and ApplicationMaster interfaces of YARN to use the YARN to apply for computing resources. As shown in Figure 2-61.

Figure 2-61 Apache YARN & MapReduce architecture

Relationship with Other Components

Relationship Between MapReduce and HDFS
  • With high fault tolerance, the Hadoop Distributed File System (HDFS) can be deployed on inexpensive hardware. It provides high throughout to access application programs with huge data sets.
  • MapReduce is a programming model used for parallel computation of large data sets (larger than 1 TB). Data computed by MapReduce can come from multiple data sources, such as Local FileSystem, HDFS, and databases. Most data computed by MapReduce comes from the HDFS. The high throughput of HDFS can be used to read massive data. After being computed, data can be stored in the HDFS.
Relationship Between MapReduce and YARN

MapReduce is a computing framework running on the YARN, which is used for batch processing. MRv1 is implemented based on MapReduce in Hadoop 1.0, which is composed of programming models (new and old programming interfaces), running environment (JobTracker and TaskTracker), and data processing engines (MapTask and ReduceTask). This framework is still weak in extendability, error tolerance (JobTracker single fault), and compatibility with multiple frameworks. (Currently, only the Mapreduce computing framework is supported.) MRv2 is implemented based on MapReduce in Hadoop 2.0. The source code reuses MRv1 programming models and data processing engine implementation, and the running environment is composed of ResourceManager and ApplicationMaster. ResourceManager is a brand new resource manager system, and ApplicationMaster is responsible for cutting MapReduce job data, assigning tasks, applying for resources, scheduling tasks, and tolerating errors.

Download
Updated: 2019-05-17

Document ID: EDOC1100074548

Views: 3063

Downloads: 35

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next