Gossip Protocol

Consul uses a gossip protocol to manage membership and broadcast messages to the cluster. The protocol, membership management, and message broadcasting is provided through the Serf library. The gossip protocol used by Serf is based on a modified version of the SWIM (Scalable Weakly-consistent Infection-style Process Group Membership) protocol. Refer to the Serf documentation for additional information about the gossip protocol.

Gossip in Consul

Consul uses a LAN gossip pool and a WAN gossip pool to perform different functions. The pools are able to perform their functions by leveraging an embedded Serf library. The library is abstracted and masked by Consul to simplify the user experience, but developers may find it useful to understand how the library is leveraged.

LAN Gossip Pool

Each datacenter that Consul operates in has a LAN gossip pool containing all members of the datacenter (clients and servers). Membership information provided by the LAN pool allows clients to automatically discover servers, reducing the amount of configuration needed. Failure detection is also distributed and shared by the entire cluster, instead of concentrated on a few servers. Lastly, the gossip pool allows for fast and reliable event broadcasts.

WAN Gossip Pool

The WAN pool is globally unique. All servers should participate in the WAN pool, regardless of datacenter. Membership information provided by the WAN pool allows servers to perform cross-datacenter requests. The integrated failure detection allows Consul to gracefully handle loss of connectivity--whether the loss is for an entire datacenter, or a single server in a remote datacenter.

Lifeguard Enhancements

SWIM assumes that the local node is healthy, meaning that soft real-time packet processing is possible. The assumption may be violated, however, if the local node experiences CPU or network exhaustion. In these cases, the serfHealth check status can flap. This can result in false monitoring alarms, additional telemetry noise, and CPU and network resources being wasted as they attempt to diagnose non-existent failures.

Lifeguard completely resolves this issue with novel enhancements to SWIM.

For more details about Lifeguard, please see the Making Gossip More Robust with Lifeguard blog post, which provides a high level overview of the HashiCorp Research paper Lifeguard : SWIM-ing with Situational Awareness. The Serf gossip protocol guide also provides some lower-level details about the gossip protocol and Lifeguard.