hadoop Cookbook

Installs and Configures Hadoop on Ubuntu with hadoop installed(using official .deb package).

Requirements

This cookbook is part of HadoopStack. In order to skip the time required in installation of Hadoop on Instances, we decided to use an image with Hadoop pre-installed. This cookbook currently supports Ubuntu with Hadoop pre-installed from official .deb package.

http://archive.apache.org/dist/hadoop/core/hadoop-1.2.1/

Attributes

hadoop::default

Key	Type	Description	Default
`['hadoop']['mapred_user']`	String	User on behalf of whom job/tasktracker daemons will run	`mapred`
`['hadoop']['hdfs_user']`	String	User on behalf of whom name/datanodes daemons will run	`hdfs`
`['hadoop']['group']`	String	A common system group for hadoop daemons	`hadoop`
`['hadoop']['jobtracker']`	String	IP of jobtracker
`['hadoop']['namenode']`	String	IP of namenode
`['hdfs_replication']`	Integer	Replication Factor	`2`
`['hadoop']['dfs_dir']`	String	Parent directory of Namenode/Datanode dir	`/mnt/dfs`
`['hadoop']['namenode_dir']`	String	Namenode Directory	`/mnt/dfs/nn`
`['hadoop']['datanode_dir']`	String	Datanode Directory	`/mnt/dfs/dn`
`['hadoop']['mapred_local_dir']`	String	Mapred local directory	`/mnt/mapred/local`
`['hadoop']['mapred_system_dir']`	String	Mapred system directory	`/mnt/mapred/system`
`['hadoop']['log_dir']`	String	Log directory for Hadoop daemons	`/mnt/log/hadoop`
`['hadoop']['pid_dir']`	String	PID directory for Hadoop Daemons	`/var/run/hadoop`
`['hadoop']['role']`	String	Hadoop Role for the Instance

Usage

Create roles for appropriate services - jobtracker, tasktracker, namenode and datanode. Update the run_list and set at least two attributes - ['hadoop']['namenode'] and ['hadoop']['jobtracker'].

If its traditional HDFS

name "jobtracker"
description "Role to initiate jobtracker"
run_list [
    "recipe[hadoop::default]"
    ]
default_attributes("hadoop" => {
    "jobtracker" => <jobtracker_ip>,
    "namenode" => <namenode_ip>,
    "role" => "jobtracker"
})

If you are using S3 as storage backend.

name "tasktracker"
description "Role to initiate tasktracker"
run_list [
    "recipe[hadoop::default]"
    ]
default_attributes("hadoop" => {
    "jobtracker" => <jobtracker_ip>,
    "namenode" => <namenode_ip>,
    "role" => "tasktracker"
    "dfs" => {
        "uri" => "s3://"
    }
    "s3" => {
        "bucket" => <bucket_name>
  }
})

hadoop::default

The default recipe creates configuration files

core-site.xml
mapred-site.xml
hdfs-site.xml

in /etc/hadoop directory using erb templates available in templates/.

hadoop::prepare

This recipe is included in default and is used to create and set appropriate permissions for hadoop directories.

hadoop::jobtracker

This recipe enables and starts jobtracker service.

hadoop::tasktracker

This recipe enables and starts tasktracker service.

Contributing

Fork the repository on Github
Create a named feature branch (like add_component_x)
Write you change
Test it thoroughly
Submit a Pull Request using Github

License and Authors

Authors: Shashank Sahni [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
attributes		attributes
recipes		recipes
templates/default		templates/default
CHANGELOG.md		CHANGELOG.md
README.md		README.md
metadata.rb		metadata.rb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hadoop Cookbook

Requirements

Attributes

hadoop::default

Usage

hadoop::default

hadoop::prepare

hadoop::jobtracker

hadoop::tasktracker

Contributing

License and Authors

About

Releases

Packages

Languages

siel-iiith/hadoop-cookbook

Folders and files

Latest commit

History

Repository files navigation

hadoop Cookbook

Requirements

Attributes

hadoop::default

Usage

hadoop::default

hadoop::prepare

hadoop::jobtracker

hadoop::tasktracker

Contributing

License and Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages