Map Reduce Tutorial Gives a Brief Overview of the Application

Map Reduce Tutorial Gives a Brief Overview of the Application

MapReduce is an application that is integral part of Hadoop programming environment.  Basically it is software framework that helps easy and convenient writing of applications that are used to manage big data. The app works in parallel in large clusters of multifarious commodity hardware. Performance of the application is reliable as well as fault tolerant in nature. Using the MapReduce Tutorial can give a brief overview of the app and tasks performed by it and the processes used for it.

Normally the MapReduce splits input data sets into chunks that are independent of each other. These blocks of data are processed using the map tasks in parallel with each other. MapReduce framework would sort out the outputs of the map and then they are input to the ‘reduce tasks’.

Both the input and output jobs are sorted in file systems. Framework would take care of scheduling and monitoring the tasks. Thereafter the failed tasks would be re-executed.

MapReduce Tutorial also teaches the learner about the nodes in the program. Nodes can be compute nodes or storage nodes. That means the MapReduce framework as well as the Hadoop distributed file systems are running the same type of nodes all the times. The process results in high aggregate bandwidth through the cluster.

Basically the MapReduce framework contains only one master Resource Manager, and one slave Node Manager per cluster node. It also contains the MRAppMaster per application. Task accomplished by MapReduce includes the identifications of input and output locations as well as the supply map and the reduce functions implementing interfaces or the abstract classes.

Above mentioned tasks in addition to other job parameters usually comprise the job configurations in MapRduce. Once the configuration is accomplished, Hadoop job client would submit the job as well as configuration to the Resource Managere. Thereafter the Resource Manager would distribute the software or the configuration to the slaves and in the process would schedule as well as monitor the tasks. The result would be status and diagnostic information for the analysts.

Interesting fact about MapReduce applications is that they do not need to be written in Java though Hadoop programming environment is created with Java language. For instance; Hadoop streaming used to create as well as run jobs with executables as mapper and reducer and Hadoop pipes is the API for implementing MapReduce applications.

Exclusive basis for operating MapReduce framework are the key value pairs. Input is placed in the job as a set of key value pairs and it also presents other sets of key value pairs as the output. MapReduce Tutorial usually devotes some length in explaining the concept of key value pair that is vital for the functioning of MapReduce framework.

MapReduce framework has two interfaces. One of them is the Mapper and the other one is Reducer. Applications implement them to provide the map as well as to reduce the methods. Core interfaces include job, partitioner, InputFormat, OutputFormat, as well as others.

There are also some important elements in the MapReduce framework like the distributed cache, isolation and runner etc. However it is difficult explaining all of them in detail in short tutorial. Interested learners may access the Apache Hadoop website to learn about them.

A very nice article to start on MapReduce, for Understanding towards the application, A good post, good and keep it up Vaishnavi Agrawal and all the best for your future article's.

Like
Reply

It is very nice Brief Overview for Map Reduce framework

Like
Reply

To view or add a comment, sign in

More articles by Vaishnavi Agrawal

  • Hadoop PIG: Things to Know

    Hadoop PIG is basically the platform that facilitates analysis of big data sets. It also includes high level language…

    3 Comments
  • Trends of Big Data Technology

    Big data technology has become the current buzzword in the realm of information technology. It is therefore interesting…

    7 Comments
  • JBPM Tutorial for Beginners

    JBPM is a fully open-source , light-weight, flexible Business Process Management (BPM) Suite written in Java that…

    1 Comment

Others also viewed

Explore content categories