Chef Habitat Internals

This section will dive into the implementation details of how various Chef Habitat components work. These topics are for advanced users. It is not necessary to learn these concepts in order to use Chef Habitat.

Table of Contents

Supervisor Internals

The Chef Habitat Supervisor is similar in some ways to well-known process supervisors like systemd, runit or smf. It accepts and passes POSIX signals to its child processes, restarts child processes if and when they fail, ensures that children processes terminate cleanly, and so on.

Because the basic functionality of process supervision is well-known, this document does not discuss those details. Instead, this document focuses strictly on the internals of the feature that makes the Chef Habitat Supervisor special: the fact that each Supervisor is connected to others in a peer-to-peer, masterless network which we refer to as a ring. This allows Supervisors to share configuration data with one another and adapt to changing conditions in the ring by modifying their own configuration.

Architecture

Supervisors are configured to form a network by using the --peer argument and pointing them at peers that already exist. In a real-life deployment scenario, Supervisors in a network may also have a shared encryption key, so that inter-Supervisor traffic is encrypted. (See the security documentation for more details.)

Supervisor rings can be very large, comprising thousands of supervisors. The Supervisor communication protocol is low-bandwidth and designed to not interfere with your application's actual production traffic.

Rings are divided into service groups, each of which has a name. All Supervisors within a service group share the same configuration and topology.

Butterfly

Chef Habitat uses a gossip protocol named "Butterfly". It is a variant of SWIM for membership and failure detection (over UDP), and a ZeroMQ based variant of Newscast for gossip. This protocol provides failure detection, service discovery, and leader election to the Chef Habitat Supervisor.

Butterfly is an eventually consistent system - it says, with a very high degree of probability, that a given piece of information will be received by every member of the network. It makes no guarantees as to when that state will arrive; in practice, the answer is usually "quite quickly".

Vocabulary

  • Members: Butterfly keeps track of "members"; each Chef Habitat Supervisor is a single member.
  • Peer: All the members a given member is connected to are its "peers". A member is seeded with a list of "initial peers".
  • Health: The status of a given member, from the perspective of its peers.
  • Rumor: A piece of data shared with all the members of a ring; examples are election, membership, services, or configuration.
  • Heat: How many times a given rumor has been shared with a given member.
  • Ring: All the members connected to one another form a Ring.
  • Incarnation: A counter used to determine which message is "newer".

Transport Protocols

Supervisors communicate with each other using UDP and ZeroMQ, over port 9638.

Information Security

Butterfly encrypts traffic on the wire using Curve25519 and a symmetric key. If a ring is configured to use transport level encryption, only members with a matching key are allowed to communicate.

Service Configuration and Files can both be encrypted with public keys.

Membership and Failure Detection

Butterfly servers keep track of what members are present in a ring, and are constantly checking each other for failure. Any given member is in one of four health states:

  • Alive: this member is responding to health checks.
  • Suspect: this member has stopped responding to our health check, and will be marked confirmed if we do not receive proof it is still alive soon.
  • Confirmed: this member has been un-responsive long enough that we can cease attempting to check its health.
  • Departed: this member has been intentionally kicked out of the ring for behavior unbecoming of a Supervisor, and is not allowed to rejoin. This is done via a human operator using the hab CLI commands.

The essential flow is:

  • Randomize the list of all known members who are not Confirmed or Departed.
  • Every 3.1 seconds, pop a member off the list, and send it a "PING" message.
  • If we receive an "ACK" message before 1 second elapses, the member remains Alive.
  • If we do not receive an "ACK" in 1 second, choose 5 peers (the "PINGREQ targets"), and send them a "PINGREQ(member)" message for the member who failed the PING.
  • If any of our PINGREQ targets receive an ACK, they forward it to us, and the member remains Alive.
  • If we do not receive an ACK via PINGREQ with 2.1 seconds, we mark the member as Suspect, and set an expiration timer of 9.3 seconds.
  • If we do not receive an Alive status for the member within the 9.3 second suspicion expiration window, the member is marked as Confirmed.
  • Move on to the next member, until the list is exhausted; start the process again.

When a Supervisor sends the PING, ACK and PINGREQ messages, it includes information about the 5 most recent members. This enables membership to be gossiped through the failure protocol itself.

This process provides several nice attributes:

  • It is resilient to partial network partitions.
  • Due to the expiration of suspected members, confirmation of death spreads quickly.
  • The amount of network traffic generated by a given node is constant, regardless of network size.
  • The protocol uses single UDP packets which fit within 512 bytes.

Butterfly differs from SWIM in the following ways:

  • Rather than sending messages to update member state, we send the entire member.
  • We support encryption on the wire.
  • Payloads are protocol buffers.
  • We support "persistent" members - these are members who will continue to have the failure detection protocol run against them, even if they are confirmed dead. This enables the system to heal from long-lived total partitions.
  • Members who are confirmed dead, but who later receive a membership rumor about themselves being suspected or confirmed, respond by spreading an Alive rumor with a higher incarnation. This allows members who return from a partition to re-join the ring gracefully.

Gossip

Butterfly uses ZeroMQ to disseminate rumors throughout the network. Its flow:

  • Randomize the list of all known members who are not Confirmed dead.
  • Every second, take 5 members from the list.
  • Send each member every rumor that has a Heat lower than 3; update the heat for each rumor sent.
  • When the list is exhausted, start the loop again.

Whats good about this system:

  • ZeroMQ provides a scalable PULL socket, that processes incoming messages from multiple peers as a single fair-queue.
  • It has no back-chatter - messages are PUSH-ed to members, but require no receipt acknowledgement.
  • Messages are sent over TCP, giving them some durability guarantees.
  • In common use, the gossip protocol becomes inactive; if there are no rumors to send to a given member, nothing is sent.

Papers

  • Many more details about the operation of SWIM can be found in its paper.
  • For information about the newscast approach to rumor dissemination, please refer to the paper.

Leader Election

The Chef Habitat Supervisor performs leader election natively for service group topologies that require one, such as leader-follower.

Because Chef Habitat is an eventually-consistent distributed system, the role of the leader is different than in strongly-consistent systems. It only serves as the leader for application level semantics, e.g. a database write leader. The fact that a Supervisor is a leader has no bearing upon other operations in the Chef Habitat system, including rumor dissemination for configuration updates. It is not akin to a Raft leader, through which writes must all be funneled. This allows for very high scalability of the Chef Habitat Supervisor ring.

Services grouped using a leader need to have a minimum of three supervisors in order to break ties. It is also strongly recommended that you do not run the service group with an even number of members. Otherwise, in the event of a network partition with equal members on each side, both sides will elect a new leader, causing a full split-brain from which the algorithm cannot recover. Supervisors in a service group will warn you if you are using leader election and have an even number of supervisors.

Protocol for electing a leader

When a service group starts in a leader topology, it will wait until there are sufficient members to form a quorum (at least three). At this point, an election cycle can happen. Each Supervisor injects an election rumor into ring, targeted at the service group, with the exact same rumor, which demands an election and insists that the peer itself is the leader. This algorithm is known as Bully.

Every peer that receives this rumor does a simple lexicographic comparison of its GUID with the GUID of the peer contained in that rumor. The winner is the peer whose GUID is higher. The peer then adds a vote for the GUID of the winner, and shares the rumor with others, including the total number of votes of anyone who previously voted for this winner.

An election ends when a candidate peer X gets a rumor back from the ring saying that it (X) is the winner, with all members voting. At this point, it sends out a rumor saying it is the declared winner, and the election cycle ends.

Papers

  • For more information about the Bully algorithm, please see the paper "Elections in a Distributed Computing System" by Héctor García-Molina.

Cryptography

Chef Habitat implements cryptography using a Rust implementation of NaCl called libsodium. libsodium provides a fast, modern framework for encryption, decryption, signing, and verification.

Chef Habitat uses both symmetric encryption (for wire encryption) and asymmetric encryption (for everything else). If you are not familiar with the difference between the two, please consult this article.

Message Encryption

When you have either wire encryption or service group encryption turned on, the messages use the Curve25519, Salsa20, and Poly1305 ciphers specified in Cryptography in NaCl.

Package Signing

Chef Habitat packages are signed using BLAKE2b checksums. BLAKE2b is a cryptographic hash function faster than MD5, SHA-1, SHA-2 and SHA3, yet provides at least as much security as the latest standard SHA-3.

You can examine the first four lines of a .hart file to extract the signature from it, because it is an xz-compressed tarball with a metadata header. The hab pkg header command will do this for you.

$ hab pkg header somefile.hart

outputs:

» Reading package header for somefile.hart
Package : somefile.hart
Format Version : HART-1
Key Name : myorigin-19780608081445
Hash Type : BLAKE2b
Raw Signature : a8yDoiA0Mv0CcW6xVyfkSOIZ0LW0beef4RPtvKL56MxemgG6dMVlKG1Ibplp7DUByr5az0kI5dmJKXgK6KURDzM1N2Y2MGMxYWJiMTNlYjQxMjliZTMzNGY0MWJlYTAzYmI4NDZlZzM2MDRhM2Y5M2VlMDkyNDFlYmVmZDk1Yzk=

The .hart file format is designed in this way to allow you to extract both the signature and the payload separately for inspection. To extract only the xz-compressed content, bypassing the signature, you could type this:

$ tail -n +6 somefile.hart | xzcat | tar x


Bootstrapping Chef Habitat

This document provides developer documentation on how the Chef Habitat system becomes self-sustaining. It is built upon the work from the Linux from Scratch project.

This instructions in this document may become rapidly out-of-date as we develop Chef Habitat further. Should you have questions, please join us in Slack.

Part I: Background

In order to bootstrap the system from scratch, you should be familiar with how the Linux From Scratch project works.

We add the following software to augment the Linux From Scratch toolchain:

  • Statically built BusyBox - used for the unzip implementation. This is copied in to tarball as /tools/bin/busybox.
  • Statically built Wget with OpenSSL support - used by the build program to download sources. This is copied in to the tarball as /tools/bin/wget.real.
  • Statically built rq - used by the build program to verify configuration value types. This is copied in to the tarball as /tools/bin/rq.
  • A symlink of unzip which points to the BusyBox binary and used by the build program to extract zipped source archives. This is copied in to the tarball as /tools/bin/unzip -> busybox.
  • A copy of curl’s cacert.pem certificates - used by wget when connecting to SSL-enabled websites. This is copied into the tarball as /tools/etc/cacert.pem.
  • A small wrapper script for wget so that we use the SSL certificates we added to the tarball. Its source is:
    #!/tools/bin/busybox sh
    exec /tools/bin/wget.real --ca-certificate /tools/etc/cacert.pem $*

Finally, we place a recent last-known-good copy of the hab binary inside tools/bin.

The entire tarball of bootstrap "tools" lives inside the stage1 studio tarball. This should be unpacked into /tools on a Linux host that will serve as the build environment until the system is self-sustaining through the rest of this procedure.

Note: In order to build large amounts of software in the fastest way possible, we recommend you use a powerful machine, such as a c4.4xlarge on Amazon Web Services (AWS). While it may be possible to run these builds in Docker containers, in workstation virtual machines, or on older hosts, it's well worth it to spend a few dollars to save potentially dozens of hours of wait time.

Part II: Stage 0

Creating The stage1 tarball

If you need to build the stage1 tarball from scratch, we will be following the Linux from Scratch guide, specifically from the start to the end of section II.5. ("Constructing a Temporary System"). By this point, you should be the root user with a /mnt/lfs/tools directory containing the cross compiled minimal toolchain. From here, we create a tarball of the tools/ directory and append our Chef Habitat-specific software into it:

# Create a variable for the target tarball
$ TARFILE=~ubuntu/habitat-studio-stage1-$(date -u +%Y%m%d%H%M%S).tar
# Create the tarball
$ tar cf $TARFILE -C /mnt/lfs tools
# Install some Chef Habitat package which provide the extra, static software
$ hab install core/cacerts core/wget-static core/busybox-static core/rq core/hab
$ mkdir -p tools/{bin,etc}
# Create the wget wrapper script
$ cat <<EOF > tools/bin/wget
#!/tools/bin/busybox sh
exec /tools/bin/wget.real --ca-certificate /tools/etc/cacert.pem \$*
EOF
$ chmod 755 tools/bin/wget
# Install the program binaries, symlinks, and certificates
$ cp -p $(hab pkg path core/wget-static)/bin/wget tools/bin/wget.real
$ cp -p $(hab pkg path core/busybox-static)/bin/busybox tools/bin/busybox
$ cp -p $(hab pkg path core/rq)/bin/rq tools/bin/rq
$ cp -p $(hab pkg path core/hab)/bin/hab tools/bin/hab
$ cp -p $(hab pkg path core/cacerts)/ssl/certs/cacert.pem tools/etc/cacert.pem
$ (cd tools/bin && ln -sv ./busybox unzip)
# Append the local </span>tools/<span class="sb"> directory into the tarball and compress it
$ tar --append --file $TARFILE tools
$ xz --compress -9 --threads=0 --verbose $TARFILE
# Copy the tar.xz archive into the </span>tmp/<span class="sb"> directory so that it will be picked
# up by the stage1 Studio later on
$ cp -v ${TARFILE}.xz /tmp/

Freshening The stage1 tarball

Alternatively, from time to time and especially with breaking changes to hab’s core behavior it is a good idea to update the software in the habitat-studio-stage1 tarball, even if that means skipping the work of rebuilding the toolchain.

You can do this work on a Linux distribution such as Ubuntu, or perform these tasks in a Docker container if that's more convenient:

$ docker run --rm -ti -v $(pwd):/src ubuntu:xenial bash

These commands assume you are running in a Docker container and are missing some software:

# Install xz tools
$ apt-get update && apt-get install -y xz-utils
# Uncompress the tarball and remove old version of hab
$ tarxz=/src/habitat-studio-stage1-20160612022150.tar.xz
$ xz --decompress --keep $tarxz
$ tar --delete --file=${tarxz/.xz/} tools/bin/hab
# Extract new version of hab in correct path structure
$ hart=/src/core-hab-0.6.0-20160701014820-x86_64-linux.hart
$ mkdir -p /tmp/tools/bin
$ tail -n +6 $hart \
| xzcat \
| (cd /tmp/tools/bin && tar x --no-anchored bin/hab --strip-components=7)
# Append new version of hab into tarball
$ (cd /tmp && tar --append --file=${tarxz/.xz/} tools)
# Rename tarball to current date and recompress with xz
$ dst=/src/habitat-studio-stage1-$(date -u +%Y%m%d%H%M%S).tar.xz
$ mv ${tarxz/.xz/} ${dst/.xz/}
$ xz --compress -9 --threads=0 --verbose ${dst/.xz/}

If you upload a new version of this tarball for broader use with the Studio software, it is worth updating the source location in the Studio's hab-studio-type-stage1.sh code (the line with ${STAGE1_TOOLS_URL}). Note that simply to use or test a new tarball with Studio, you should only need to set the following before using hab studio commands:

  • export STAGE1_TOOLS_URL=habitat-studio-stage1-20160612022150.tar.xz

and finally, place this tarball under /tmp which will help the Studio code find this tarball as if it was previously downloaded, and it will be used directly.

Part III: Preparing to build

In this stage, we rebuild all the base packages needed by Chef Habitat using the tools (compiler, etc.) from the existing tools tarball. You will need a copy of the habitat and core-plans repos on your local disk. For our work, we will assume that everything is being run under a common parent directory called habitat-sh/. Assuming we want to rebuild the Chef Habitat software as of the last release tag (we'll use 0.56.0 here) and the core plans from latest, here's how to get set up:

$ mkdir habitat-sh
$ cd habitat-sh
$ git clone https://github.com/habitat-sh/habitat.git
$ (cd habitat && git checkout 0.56.0)
$ git clone https://github.com/habitat-sh/core-plans.git

Next, let's get our minimum Chef Habitat software to start us off:

# (Optional) Completely clean this build host--this will purge all Chef Habitat
# software, caches, and keys from this host!
$ rm -rf /hab
# Install the latest version of the 'hab' program
$ sudo -E ./habitat/components/hab/install.sh
# (Optional) Generate a 'core' origin key, if not already imported or created.
# Otherwise, you'll need to install the real 'core' origin key
$ hab origin key generate core

Finally, it's a good idea to clear out any work that might have been left over from previous builds:

$ sudo rm -rf ./results ./tmp/*.db

Part IV: Stage 1

To simplify the process of setting up and entering a stage1 Studio, use this stage1 Studio wrapper:

$ ./core-plans/bin/bootstrap/stage1-studio.sh enter

Now in the stage1 Studio:

# (Optional) run all test suites in do_check() build phases
$ export DO_CHECK=true
# Build all base plans in order
$ ./core-plans/bin/bootstrap/stage1-build-base-plans.sh

Once the set has finished building, we are going to use the newly built Studio immediately and move the artifacts to the side for more use:

# Exit the Studio
$ exit
# Save the stage1 artifacts
$ mv ./results ./results-stage1
# Install the just-built Studio
$ sudo hab install ./results-stage1/core-hab-studio-*.hart

Now we're ready to use the stage1 packages to make a set of packages that are completely decoupled from the stage1 tools/ infrastructure, which we'll call a "stage2" build.

Part V: Stage 2

In this stage, we rebuild all the base packages needed by Chef Habitat using the tools (compiler, etc.) from the previous stage, thus making the system self-hosted. This phase involves not using existing packages that are published in any existing Builder API, so we're going to more carefully create a default-type Studio. If we use a fully-qualified package identifier and use the HAB_STUDIO_BACKLINE_PKG environment variable, this will skip the call back to a Builder instance.

To make this work, we need to get the fully-qualified package identifier of the core/hab-backline package we just built in stage1. You can use ls -1 ./results-stage1/core-hab-backline-*.hart to list the artifact. The naming of this file will help you to figure out the package identifier. For example this file:

./results-stage1/core-hab-backline-0.56.0-20180322160653-x86_64-linux.hart

Has a package identifier of core/hab-backline/0.56.0/20180322160653. In general, the / characters in a package identifier become - characters in an artifact name, and you drop the target info (i.e. x86_64) and file extension. For the remainder of this guide, let's assume that your fully-qualified package identifier is core/hab-backline/0.56.0/20180322160653.

Finally, to help set up environment variables for the Studio, we're going to use a stage2 Studio wrapper:

# Install only stage1 packages when setting up this Studio--that's why we're
# calling `new` explicitly before using
$ env HAB_STUDIO_BACKLINE_PKG=core/hab-backline/0.56.0/20180322160653 ./core-plans/bin/bootstrap/stage2-studio.sh new
# Enter the stage2 Studio
$ ./core-plans/bin/bootstrap/stage2-studio.sh enter

Now that we're in a Studio, it's been set up with an empty artifact cache. We're going to set things up so that our install logic won't need to call out to a Builder API:

# Copy all stage1 packages into the artifact cache
$ cp ./results-stage1/*.hart /hab/cache/artifacts/
# Install all stage1 packages directly from the artifact cache
$ ls -1 /hab/cache/artifacts/*.hart | while read hart; do hab pkg install $hart; done

Okay, time to build all the base plans. As before, there is a wrapper to help here:

# (Optional) run all test suites in do_check() build phases
$ export DO_CHECK=true
# Build all base plans in order
$ ./core-plans/bin/bootstrap/stage2-build-base-plans.sh

By now we should have a full set of base packages that have been built using Chef Habitat packages. These should be suitable for use when building other packages. As a final cleanup:

# Exit the Studio
$ exit
# Save the stage2 artifacts
$ mv ./results ./results-stage2

Part VI: Remaining packages in world

At this point we have a solid set of base packages which can be used to built anything and everything else. As such, it makes sense to at least attempt to rebuild all the other core packages that haven't been touched to this point. The first task here is to generate the build list order of all remaining packages in our "world" and exclude all the base packages so that we can avoid building them yet another time (for no additional benefit). Luckily, there is a build-order.rb program in the core-plans repository that will generate such a list. The reason that this program is in Ruby is to take advantage of the topological sorting library in Ruby's stdlib (called tsort). If you've ever used rake before with tasks and dependencies, tsort is the magic which powers it.

So, we're going to find all relevant Plans, feed them to build-order.rb, exclude the base plans set, and format the results here:

# Install Ruby
$ sudo hab pkg install core/ruby
# Calculate build order of remaining plans and exclude base plans
$ find core-plans habitat/components -name plan.sh </span>
| $(hab pkg path core/ruby)/bin/ruby ./core-plans/bin/build-order.rb --without-base </span>
| cut -d ' ' -f 2 </span>
| grep -v '^$' </span>
> stage3_world_order
# Make a copy of the "master" list and use the working copy for building
# against
$ cp stage3_world_order stage3_world_order.working

As in the stage2 build, we need to get the fully-qualified package identifier of the core/hab-backline package we just built in stage2. You can use ls -1 ./results-stage2/core-hab-backline-*.hart to list the artifact. The naming of this file will help you to figure out the package identifier. For now, let's assume that the identifier corresponding to the artifact was core/hab-backline/0.56.0/20180322181801.

As before, to help set up environment variables for the Studio, we're going to use a stage3 Studio wrapper:

# Install only stage2 packages when setting up this Studio--that's why we're
# calling </span>new<span class="sb"> explicitly before using
$ env HAB_STUDIO_BACKLINE_PKG=core/hab-backline/0.56.0/20180322181801 ./core-plans/bin/bootstrap/stage3-studio.sh new
# Enter the stage3 Studio
$ ./core-plans/bin/bootstrap/stage3-studio.sh enter

Now that we're in a Studio, it's been set up with an empty artifact cache. We're going to set things up so that our install logic won't need to call out to a Builder API:

# Copy all stage2 packages into the artifact cache
$ cp ./results-stage2/.hart /hab/cache/artifacts/
# Install all stage2 packages directly from the artifact cache
$ ls -1 /hab/cache/artifacts/.hart | while read hart; do hab pkg install $hart; done

Okay, time to build all the remaining plans. As before, there is a wrapper to help here:

# (Optional) run all test suites in do_check() build phases
$ export DO_CHECK=true
# Build all base plans in order
$ ./core-plans/bin/bootstrap/stage3-build-remaining-plans.sh stage3_world_order.working

If there are failures as you work through the remaining world, you may discover this is due to a) an older version of the software that can't build against a newer/different compiler, b) a failure that is currently in master and not yet caught (ex: source url now returns a 404), c) other issues which require more debugging and work. You can try to fix the package at this moment and re-run the stage3-build-remaining-plan.sh which will resume your failed build and continue. Alternatively, you could remove that Plan out of the rebuild set and deal with it separately. The catch here is that your failing package might become a dependency of another package later on, so you would want to remove those as well. Thankfully, there's a way you can get this list, by running another program in the core-plans repository. Note that you want to run this in the same common parent directory containing the core-plans and habitat git clones, and also outside of a Studio. Let's assume that the plan is for core/doxygen. You can run:

$ find core-plans habitat/components -name plan.sh </span>
| $(hab pkg path core/ruby)/bin/ruby ./core-plans/bin/build-dependent-order.rb core/doxygen </span>
| cut -d ' ' -f 2

in my case today, I got the following results back as output:

core-plans/clingo
core-plans/aspcud
core-plans/opam
core-plans/libyajl2
core-plans/msgpack

So what we want to do now (assuming you want to skip solving this issue right this second) is to edit the stage3_world_order.working file and delete the following lines:

core-plans/doxygen
core-plans/clingo
core-plans/aspcud
core-plans/opam
core-plans/libyajl2
core-plans/msgpack

Then, back in your Studio instance shell, re-run the build wrapper program:

$ ./core-plans/bin/bootstrap/stage3-build-remaining-plans.sh stage3_world_order.working

You'll want to make a note of which packages you've skipped, but since we have a master/unmodified copy of the build order you can see your changes with:

diff -u stage3_world_order stage3_world_order.working

Finally, assuming you have a lot of newly built packages, it's worthwhile to exit that Studio and save off the packages:

# Exit the Studio
$ exit
# Save the stage3 artifacts
$ mv ./results ./results-stage3

Part VIII: Cleaning up

While you don't have to wait for the end to completely clean up, it can be beneficial to be able to enter any of the stage1/stage2/stage3 Studios for reference. However, if and when you feel like you're finished with any of these Studio instances, you can clean them up with the associated Studio wrapper script. For example, to clean up all the staging Studio instances:

$ ./core-plans/bin/bootstrap/stage1-studio.sh rm
$ ./core-plans/bin/bootstrap/stage2-studio.sh rm
$ ./core-plans/bin/bootstrap/stage3-studio.sh rm

Part IX: Next steps

There is more to write here once you have your local artifacts in the various results-stage* directories. However next steps to consider are:

  • Uploading the stage2 and stage3 artifacts to a Depot and possibly promote them to the stable channel