The original github repository seems to be abandoned, and probably because a lot of things have happened: Horntonworks no longer exists; the fad of big data has gradually died down; k8s is on its way to replace docker etc etc. Nonetheless, it is still important to learn these things, so I picked it up and I will try to publish a working version of docker images into docker hub.
Some of the changes to revive this project will be:
- update base images to centos-7 at least,
- update Ambari/HDP to the last 2.X version that still included HIVE view,
- allow both hostname and fqdn,
- fix postgres auth issues,
- fix some java-8 TLS quirks,
- maybe remove those extra ssh stuff?
- upload working images to docker hub.
Let me know if you have any other suggestions.
Built and tested with the latest version of Docker for Mac and CentOS. Older versions of Docker provided by docker-machine and/or Docker Toolbox will not work.
Trying this on Windows? Please let me know how it works out.
##Project Goals:
- Provide a reusable base with which to experiment with various Hadoop versions, its ecosystem, and its configs w/o VMs
- Provide a pseudo-distributed Hadoop environment, because single node setups make bad assumptions about how software works in multi-node clusters.
- Provide an excuse to learn & play with Docker
These containers are not pushed to DockerHub, thus you'll need to build them locally:
docker-compose -f examples/compose/single-container.yml build
A successful build looks like:
docker-hdp randy> docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hdp/node latest cacb20b1b0d3 15 seconds ago 7.682 GB
hdp/ambari-server latest b0fad41dd49c 15 minutes ago 2.492 GB
hdp/postgres latest ad42250d5c8b 23 minutes ago 320.2 MB
centos 6 cf2c3ece5e41 3 weeks ago 194.6 MB
postgres latest 7ee9d2061970 6 weeks ago 275.3 MB
##Running HDP 2.5: To run 3 containers (postgres, ambari-server, and a "single container HDP cluster"):
docker-compose -f examples/compose/single-container.yml up
After a minute or so, you can access Ambari's Web UI at localhost:8080. Default User/PW is admin/admin.
##Using Ambari Blueprints: To snapshot your cluster's configuration into a blueprint:
# You can extract a blueprint as soon as you click Deploy. No need to wait for install to complete.
curl --user admin:admin -H 'X-Requested-By:admin' localhost:8080/api/v1/clusters/dev?format=blueprint > examples/blueprints/single-container.json
Note: I give Docker 7 cores and 14GB of RAM. If you're running with less, you should generate your own Ambari Blueprints with the recommendations Ambari provides (it should auto-detect your environment's available resources).
To submit your blueprint to Ambari and have it install your cluster:
# Can swap "single-container" for multi-container, or any type saved in examples/blueprints and examples/hostgroups
sh submit-blueprint.sh single-container examples/blueprints/single-container.json
There are additional blueprints for common test-beds in examples/blueprints, including Hive-LLAP and HBase-Phoenix.
##Notes:
- Ambari, Hive, and Ranger dbs have been pre-created in the postgres database running at postgres.dev. To configure them in Ambari, set Postgres as the DB type and change the Database URL to point at postgres.dev (as depicted in screenshot below) and leave everything else as the default options. The password for the dbs are all "dev":
- The "node" container can be used for master, worker, or both types of services. The ambari-agent is configured to register with ambari-server.dev automatically, thus no SSH key setup is necessary. Use dn0.dev (and master0.dev if using multi-container):
- Yum packages for all HDP services have been pre-installed in the "node" container. This lets cluster install take place much faster at the expense of a spurious warning from Ambari during Host-Checks.
- All Ambari and HDP repositories are downloaded at buildtime. The versions and URLs are specified in .env in the project's root
- Docker for Linux is more restrictive about "su" use, which Ambari relies on heavily, thus examples/compose/single-container.yml and multi-container.yml images are marked "privileged:true". Read up on the implications.
##Helpful Hints: If you HDFS having issues starting up/not leaving SafeMode, it's probably because docker-compose is re-using containers from a previous run.
To start with fresh containers, before each run do:
docker-compose -f examples/compose/multi-container.yml rm
Going to remove compose_ambari-server.dev_1, compose_dn0.dev_1, compose_master0.dev_1, compose_postgres.dev_1
Are you sure? [yN] y
Removing compose_ambari-server.dev_1 ... done
Removing compose_dn0.dev_1 ... done
Removing compose_master0.dev_1 ... done
Removing compose_postgres.dev_1 ... done
Docker for Mac sometimes has storage space problems. I recommend adding the following to your ~/.bash_profile and restarting terminal:
function docker-cleanup(){
# remove untagged images
docker rmi $(docker images | grep none | awk '{ print $3}')
# remove unused volumes
docker volume rm $(docker volume ls -q )
# `shotgun` remove unused networks
docker network rm $(docker network ls | grep "_default")
# remove stopped + exited containers, I skip Exit 0 as I have old scripts using data containers.
docker rm -v $(docker ps -a | grep "Exit [0-255]" | awk '{ print $1 }')
}
Run "docker-cleanup" if you run into Docker errors or "No space left on device" issues inside containers.
Since Hadoop UIs often link to hostnames, add the following to your hosts file:
echo "127.0.0.1 ambari-server ambari-server.dev" >> /etc/hosts
echo "127.0.0.1 master0 master0.dev" >> /etc/hosts
echo "127.0.0.1 dn0 dn0.dev" >> /etc/hosts
TODO:
- Steps for using latest Docker 1.12 Swarm & Compose on multiple hosts