Tuesday, April 19, 2011

Install Yahoo Oozie 3.0.0 on Apache Hadoop 0.20.2

Oozie is a workflow based service to manage data processing for Hadoop and different projects of Hadoop ecosystem like Pig, Hive, Sqoop, etc.

Oozie workflows are actions arranged in a control dependency DAG (Direct Acyclic Graph).An Oozie workflow may contain the following types of actions nodes: map-reduce, map-reduce streaming, map-reduce pipes, pig, file-system, sub-workflows etc.

Typical Ooize Design:



This post is about installing Yahoo Oozie 3.0.0 on Apache Hadoop 0.20.0. A little configuration trick is required to get Oozie working with Apache Hadoop.
Cloudera and Yahoo manage their own version of Oozie and hence pretty natural that they worked with their Hadoop without any pain.

Before diving into the flow of Oozie installation here are few softwares which are required

Warm up:
  • Apache Hadoop 0.20.0 Up and Running
  • Download and Untar Ooize 3.0.0 distro
  • Download ExtJS-2.2 library for enabling Oozie web console
  • Create ‘oozie’ named user and group on your system


Workflow for setting up Oozie:

Go through this diagram execute commands and put snippet in corresponding xml files and you are done.




Is everything Ok?


That’s it, Installation is over. To check if everything configured well
open your browser and hit http://localhost:11000 (by default bundled tomcat of Oozie-3.0.0 listen at port 11000).

Run Oozie examples bundled with Oozie and keep your figures crossed to get a long job id in response.

shell>hadoop fs -put examples examples
shell>oozie job -oozie http://localhost:11000/oozie -config examples/apps/map-reduce/job.properties -run

That's it guys.

8 comments:

  1. Everything is not okay. KerberosHadoopAccessorService calls UserGroupInformation.setConfiguration, a method that is not available until Hadoop v0.20.104.1. Hence the error below.


    SEVERE: Exception sending context initialized event to listener instance of class org.apache.oozie.servlet.ServicesLoader
    java.lang.NoSuchMethodError: org.apache.hadoop.security.UserGroupInformation.setConfiguration(Lorg/apache/hadoop/conf/Configuration;)V
    at org.apache.oozie.service.KerberosHadoopAccessorService.init(KerberosHadoopAccessorService.java:92)
    at org.apache.oozie.service.HadoopAccessorService.init(HadoopAccessorService.java:72)
    at org.apache.oozie.service.Services.setServiceInternal(Services.java:307)
    ...

    ReplyDelete
  2. It can be fixed by replacing the KerberosHadoopAccessorService in oozie-site.xml

    https://github.com/yahoo/oozie/issues/710

    ReplyDelete
  3. Thank you.Well it was nice post and very helpful information on
    Big Data Hadoop Online Training

    ReplyDelete