-
Notifications
You must be signed in to change notification settings - Fork 102
style_guide
Ordinary cookbooks describe a single system, consisting of one or more components. For example, the ganglia
cookbook has a component named agent
that gathers system metrics, and a component named master
that aggregates those metrics for monitoring.
More generally, a component is an isolatable piece of functionality that is interesting to an outside consumer -- if you would draw it in a box on your system diagram, it's a component. Installed services, executable applications (LibreOffice, emacs)
We'll use these examples repeatedly below:
-
Ganglia is a distributed system monitoring tool. The
agent
components gather and exchange system metrics, and themaster
component aggregates them. A basic setup would run themaster
component on a single machine, and theagent
component on many machines (including the master). In order to work, the master must discover all agents, and each agent must discover the master. -
Elasticsearch is a powerful distributed document database. A basic setup runs a single
server
component on each machine. Elasticsearch handles discovery, but needs a stable subset of them to declare as discoveryseed
s. -
Nginx is a fast, lightweight webserver (similar to apache). Its
server
component can proxy web requests for one or many web apps. Those apps register asite
component, which defines the receiving address (public/private/local), how the app connects to nginx (socket, port, files). -
Pig is a Big Data analysis tool that works with Hadoop, Elasticsearch and more. It provides an executable, and imports jars from hadoop, elasticsearch and others.
Notice the recurring patterns:
- provide capabilities: accept web connections, deliver metrics, respond to queries, run scripts.
-
discovery: the
ganglia_master
polls itsganglia_agent
s,elasticsearch_server
s discoveringelasticsearch_seed
s, pig discovering the elasticsearch jars.
You have your whole system diagram, with a bunch of machines that you’re going to use to actually realize that diagram.
In order to get something done, you have a whole bunch of cooperating moving parts that need to connect: databases, programs to query them, web apps that make the results visible, and load balancers to route requests.
What’s important here are the components, the capabilities they provide, and the connections between them. This is distinct from:
- the number of underlying machines -- you might run the whole thing on your laptop while developing, across a handful of nodes in staging, and on dozens of nodes in production.
- in many cases the details of what actually implements that component -- your web app doesn’t care whether haproxy or elb does its load balancing.
- the role specific configuration of each component -- in some cases we deploy a single machine Elasticsearch to sit next to a web app, in others, the same cookbook deploys a distributed terabyte scale database.
- whether the components share the same machine, different machines, or even live in remote data centers.
(Announce mechanic)
You might deploy that system diagram on one machine, or ten, or hundreds, but the components are the same, you’re just scaling them out on the
The announcements that components make don’t just facilitate discovery. In a larger sense they describe the external contract for the component in a way that’s
- testable
- auto-wireable
The reason that if I’m NginX, I announce a port, is so that other systems can find and use it. I have to do that to make the system work. By announcing a port, some decoupled system I’ve never met before can set up a remote monitor for that port. If that port happened to be an admin dashboard for the component, a dashboard of dashboards could link to it. These outcomes are inevitable... the act of declaring your contract causes important shadow layers of the system diagram to connect support services.
There’s another set of generic patterns that components share.
Often, components announce that they
- log: write data to a log file
- executable: to run an application
-
export: libraries,
jar
s,conf
files, etc
The interesting part is that
The ability for
- my log rotation system to start rotating my logs.
- a 'disk free space' gauge to be added to the monitoring dashboard for that service?
- flume (or whatever) began picking up my logs and archiving them to a predictable location?
- in the case of standard apache logs, a listener to start counting the rate of requests, 200s, 404s and so forth?
Similarly, announcing ports should mean
- the firewall and security groups configure themselves correspondingly
- the monitor system starts regularly pinging the port for uptime and latency
- and pings the interfaces that it should not appear on to ensure the firewall is in place?
DO use those terms uniformly. For the ganglia master,
- its attributes are scoped as
node[:ganglia][:master]
- it is described by the
ganglia::master
recipe - the role
ganglia_master
pulls in all recipes (fromganglia
or elsewhere) to make a machine a functioning ganglia master
You should crisply separate cookbook-wide concerns from component concerns.
You should also separate system configuration from multi-system integration. Cookbooks should provide hooks that are neighborly but not exhibitionist, and otherwise mind their own business. The hadoop_cluster
cookbook describes hadoop, the pig
cookbook pig, and the zookeeper
cookbook zookeeper. The job of tying those components together (copying zookeeper jars into the pig home dir, or the port+addr of hadoop daemons) should be isolated.
- Naming:
-
foo/recipes/default.rb
-- information shared by anyone using foo, including support packages, directories -
foo/recipes/client.rb
-- configure me as a foo client -
foo/recipes/server.rb
-- configure me as a foo server -
foo/recipes/ec2_conf
-- cloud-specific settings
-
- Always include a
default.rb
recipe, even if it is blank. - DO NOT install daemons via the default cookbook, even if that's currently the only thing it does. Remember, a node that is a client -- or refers to any current or future component of the system -- will include the default recipe.
- Do not repeat the cookbook name in a recipe title:
hbase::master
, nothbase::hbase_master
;zookeeper::server
, notzookeeper::zookeeper_server
. - Use only
[a-z0-9_]
for cookbook and component names. Do not use capital letters or dashes. Keep names short and descriptive (preferably 15 characters or less).
- Dependencies should be announced in metadata.rb, of course.
-
DO remember to explicitly
include_recipe
for system resources --runit
,java
,announces
,thrift
andapt
. -
DO NOT use
include_recipe
unless putting it in the role would be utterly un-interesting. You want the run to break unless it's explicitly included in the role.-
yes:
java
,ruby
,announces
, etc. -
no:
zookeeper::client
,nfs::server
, or anything that will start a daemon Remember: ordinary cookbooks describe systems, roles and integration cookbooks coordinate them.
-
yes:
-
include_recipe
statements should only appear in recipes that are entry points. Recipes that are not meant to be called directly should assume their dependencies have been met. - If a recipe is meant to be the primary entrypoint, it should include default, and it should do so explicitly:
include_recipe 'foo::default'
(not just 'foo').
-
DO NOT use node[:foo] in your templates except in rare circumstances. Instead, say
variables :foo => node[:foo]
; this lets folks use that cookbook from elsewhere. (FIXME: Downgrade to a "we recommend" status. Look for RFC language here.)
- Scope concerns by cookbook or cookbook and component.
node[:hadoop]
holds cookbook-wide concerns,node[:hadoop][:namenode]
holds component-specific concerns. - Attributes shared by all components sit at cookbook level, and are always named for the cookbook:
node[:hadoop][:log_dir]
(since it is shared by all its components). - Component-specific attributes sit at component level (
node[:cookbook_name][:component_name]
): egnode[:hadoop][:namenode][:service_state]
. Do not use a prefix (NO:node[:hadoop][:namenode_handler_count]
)
-
The main attribute file should be named
attributes/default.rb
. -
If there are a sizeable number of tunable attributes (hadoop, cassandra), place them in
attributes/tuneables.rb
. -
Use generic names for simple, singular things, descriptive names for anything more complex:
- If a component has only one log file, call it 'log_file':
node[:foo][:server][:log_file]
and in general do not use a prefix. - If a component has more than one log_file, always use a prefix:
node[:foo][:server][:dashboard_log_file]
andnode[:foo][:server][:gc_log_file]
.
- If a component has only one log file, call it 'log_file':
-
If you don't have exactly the semantics and datatype of the convention, don't use the convention. That is, don't use
:port
and give it a comma-separated string, or:addr
and give it an email address.
A file is the full directory and basename for a file. A dir is a directory whose contents correspond to a single concern. A prefix not intended to be used directly -- it will be decorated with suffixes to form dirs and files. A basename is only the leaf part of a file reference. Don't use the terms 'path' or 'filename'.
Ignore the temptation to make a one-true-home-for-my-system, or to fight the package maintainer's choices. (FIXME: Rewrite to encourage OS-correct naming schemas.)
- a sandbox holding dir, pid, log, ...
-
prefix: A container with directories bin, lib, share, src, to use according to convention
- default:
/usr/local
.
- default:
-
home_dir: Logical location for the cookbook's system code.
- default: typically, leave it up to the package maintainer. Otherwise,
:prefix/share/:cookbook
should be a symlink to theinstall_dir
(see below). - instead of:
xx_home
/dir
alone /install_dir
- default: typically, leave it up to the package maintainer. Otherwise,
-
install_dir: The cookbook's system code, in case the home dir is a pointer to potential alternates.
- default:
:prefix/share/:cookbook-:version
( you don't need the directory after the cookbook runs, use:prefix/share/:cookbook-:version
instead, eg/usr/local/src/tokyo_tyrant-xx.xx
) - Make
home_dir
a symlink to this directory (eg home_dir/usr/local/share/elasticsearch
links to install_dir/usr/local/share/elasticsearch-0.17.8
).
- default:
-
src_dir: holds the compressed tarball, its expanded contents, and the compiled files when installing from source. Use this when you will run
make install
or equivalent and use the files elsewhere.- default:
:prefix/src/:system_name-:version
, eg/usr/local/src/pig-0.9.tar.gz
- do not: expand the tarball to
:prefix/src/(whatever)
if it will actually be used from there; instead, use theinstall_dir
convention described above. (As a guideline, I should be able to blow away/usr/local/src
and everything still works).
- default:
-
deploy_dir: deployed code that follows the capistrano convention. See more about deploy variables below.
- the
:deploy_dir/shared
directory holds common files - releases are checked out to
:deploy_dir/releases/{sha}
- the operational release is a symlink to the right release:
:deploy_dir/current -> :deploy_dir/releases/xxx
. - do not: use this when you mean
home_dir
.
- the
-
scratch_roots, persistent_roots: an array of directories spread across volumes, with expectations on persistence
-
scratch_root
s have no guarantee of persistence -- for example, stop/start'ing a machine on EC2 destroys the contents of its local (ephemeral) drives.persistent_root
s have the best available promise of persistance: if permanent (eg EBS) volumes are available, they will exclusively populate thepersistent_root
s; but if not, the ephemeral drives are used instead. - these attributes are provided by the
mountable_volume
meta-cookbook and its appropriate integration recipe. Ordinary cookbooks should always trust the integration cookbook's choices (or visit the integration cookbook to correct them). - each element in
persistent_roots
is by contract on a separate volume, and similarly each of thescratch_roots
is on a separate volume. A volume may be in both scratch and persistent (for example, there may be only one volume!). - the singular forms scratch_root and persistent_root are provided for your convenience and always correspond to
scratch_roots.first
andpersistent_roots.first
. This means lots the first named volume is picked on the heaviest -- if you don't like that, choose explicitly (but not randomly, or you won't be idempotent).
-
-
log_file, log_dir, xx_log_file, xx_log_dir:
- default:
- if the log files will always be trivial in size, put them in
/var/log/:cookbook.log
or/var/log/:cookbook/(whatever)
. - if it's a runit-managed service, leave them in
/etc/sv/:cookbook-:component/log/main/current
, and make a symlink from/var/log/:cookbook-component
to/etc/sv/:cookbook-:component/log/main/
. - If the log files are non-trivial in size, set log dir
/:scratch_root/:cookbook/log/
, and symlink/var/log/:cookbook/
to it. - If the log files should be persisted, place them in
/:persistent_root/:cookbook/log
, and symlink/var/log/:cookbook/
to it. - in all cases, the directory is named
.../log
, not.../logs
. Never put things in/tmp
. - Use the physical location for the
log_dir
attribute, not the /var/log symlink.
- if the log files will always be trivial in size, put them in
- default:
-
tmp_dir:
- default:
/:scratch_root/:cookbook/tmp/
- Do not put a symlink or directory in
/tmp
-- something else blows it away, the app recreates it as a physical directory,/tmp
overflows, pagers go off, sadness spreads throughout the land.
- default:
-
conf_dir:
- default:
/etc/:cookbook
- default:
-
bin_dir:
- default:
/:home_dir/bin
- default:
-
pid_file, pid_dir:
- default: pid_file:
/var/run/:cookbook.pid
or/var/run/:cookbook/:component.pid
; pid_dir:/var/run/:cookbook/
- instead of:
job_dir
,job_file
,pidfile
,run_dir
.
- default: pid_file:
-
cache_dir:
- default:
/var/cache/:cookbook
.
- default:
-
data_dir:
- default:
:persistent_root/:cookbook/:component/data
- instead of:
datadir,
dbfile,
dbdir`
- default:
-
journal_dir: high-speed local storage for commitlogs and so forth. Can be deleted, though you may rather it wasn't.
- default:
:scratch_root/:cookbook/:component/scratch
- instead of:
commitlog_dir
- default:
-
daemon_name: daemon's actual service name, if it differs from the component. For example, the
hadoop-namenode
component's daemon ishadoop-0.20-namenode
as installed by apt. -
daemon_states: an array of the verbs acceptable to the Chef
service
resource::enable
,:start
, etc. -
num_xx_processes, num_xx_threads the number of separate top-level processes (distinct PIDs) or internal threads to run
- instead of
num_workers
,num_servers
,worker_processes
,foo_threads
.
- instead of
-
log_level
- application-specific; often takes values info, debug, warn
- instead of
verbose
,verbosity
,loglevel
-
user, group, uid, gid --
user
is the user name. Theuser
andgroup
should be strings, even theuid
andgid
should be integers.- instead of username, group_name, using uid for user name or vice versa.
- if there are multiple users, use a prefix:
launcher_user
andobserver_user
.
-
release_url: URL for the release.
- instead of: install_url, package_url, being careless about partial vs whole URLs
-
release_file: Where to put the release.
- default:
:prefix/src/system_name-version.ext
, eg/usr/local/src/elasticsearch-0.17.8.tar.bz2
. - do not use
/tmp
-- let me decide when to blow it away (and make it easy to be idempotent). - do not use a non-versioned URL or file name.
- default:
-
release_file_sha or release_file_md5 fingerprint
- instead of:
whatever_checksum
,whatever_fingerprint
- instead of:
-
version: if it's a simply-versioned resource that uses the
major.minor.patch-cruft
convention. Do not use unless this is true, and do not use the source control revision ID. -
plugins: array of system-specific plugins
use deploy_{}
for anything that would be true whatever SCM you're using; use git_{}
(and so forth) where specific to that repo.
-
deploy_env production / staging / etc
-
deploy_strategy
-
deploy_user user to run as
-
deploy_dir: Only use
deploy_dir
if you are following the capistrano convention: see above. -
git_repo: url for the repo, eg
[email protected]:infochimps-labs/ironfan.git
orhttp://github.com/infochimps-labs/ironfan.git
- instead of:
deploy_repo
,git_url
- instead of:
-
git_revision: SHA or branch
- instead of:
deploy_revision
- instead of:
-
apt/{repo_name} Options for adding a cookbook's apt repo.
- Note that this is filed under apt, not the cookbook.
- Use the best name for the repo, which is not necessarily the cookbook's name: eg
apt/cloudera/{...}
, which is shared by hadoop, flume, pig, and so on. -
apt/{repo_name}/url
-- eghttp://archive.cloudera.com/debian
-
apt/{repo_name}/key
-- GPG key -
apt/{repo_name}/force_distro
-- forces the distro (eg, you are on natty but the apt repo only has maverick)
-
xx_port:
- do not use 'port' on its own.
- examples:
thrift_port
,webui_port
,zookeeper_port
,carbon_port
andwhisper_port
. - xx_port:
default[:foo][:server][:port] = 5000
- xx_ports, if an array:
default[:foo][:server][:ports] = [5000, 5001, 5002]
-
addr, xx_addr
- if all ports bind to the same interface, use
addr
. Otherwise, do not useaddr
, and use a uniquefoo_addr
for eachfoo_port
. - instead of:
hostname
,binding
,address
- if all ports bind to the same interface, use
-
Want some way to announce my port is http or https.
-
Need to distinguish client ports from service ports. You should be using cluster service discovery anyway though.
- jmx_port
- XX_heap_max, xx_heap_min, java_heap_eden
- java_home
- AVOID batch declaration of options (e.g. java_opts) if possible: assemble it in your recipe from intelligible attribute names.
- Always put file modes in quote marks:
mode "0664"
notmode 0664
.
If your app does any of the following,
- services -- Any interesting long-running process.
-
ports -- Any reserved open application port
- http: HTTP application port
- https: HTTPS application port
- internal: port is on private IP, should not be visible through public IP
- external: port is available through public IP
- metric_ports:
- jmx_ports -- JMX diagnostic port (announced by many Java apps)
- dashboards -- Web interface to look inside a system; typically internal-facing only, and probably not performance-monitored by default.
-
logs -- um, logs. You can also announce the logs' flavor:
:apache
,log4j
, etc. - scheduleds -- regularly-occurring events that leave a trace
- exports -- jars or libs that other programs may wish to incorporate
-
consumes -- placed there by any call to
discover
.
-
Describe physical configuration:
- machine size, number of instances per facet, etc
- external assets (elastic IP, ebs volumes)
-
Describe high-level assembly of systems via roles:
hadoop_namenode
,nfs_client
,ganglia_agent
, etc. -
Describe important modifications, such as
ironfan::system_internals
, mounts ebs volumes, etc -
Describe override attributes:
-
heap size
, rvm versions, etc.
-
-
roles and recipes
- remove
cluster_role
andfacet_role
if empty - are not in
run_list
, but populated by therole
andrecipe
directives
- remove
-
remove big_package unless it's a dev machine (sandbox, etc)
Roles define the high-level assembly of recipes into systems
-
override attributes go into the cluster. currently, those files are typically empty and are badly cluttering the roles/ directory. the cluster and facet override attributes should be together, not scattered in different files. roles shouldn't assemble systems. The contents of the infochimps_chef/roles/plato_truth.rb file belong in a facet.
-
Deprecated:
- Cluster and facet roles (
roles/gibbon_cluster.rb
,roles/gibbon_namenode.rb
, etc) go away - Roles should be service-oriented:
hadoop_master
considered harmful, you should explicitly enumerate the services
- Cluster and facet roles (