Shallow diving k8s components: etcd

Etcd is a high available key-value data storage that stores all the data necessary for running a Kubernetes cluster. The first time I learned about etcd i asked myself why? There are so many production-ready key-value databases out there. Why did the Kubernetes team choose etcd? What am I missing? That lead me to learn more about etcds. Etcd is perfect for kubernetes because of at least 2 reasons. One of them is because it is robust in nature. It makes sure that the data are consistent across the cluster. It makes sure that it is highly available. Another reason is, it has a feature called watch. Watch allows an observer to subscribe to changes on a particular data. It goes perfectly with Kubernete’s design paradigm.

we can see the information that is being stored in etcd using commands:

ETCDCTL_API=3 etcdctl get --prefix --keys-only /

ETCDCTL_API=3 etcdctl get /registry/pods/kube-system/kube-dns-xx

Since etcd is storing the states of the application, it is important to gain some confidence about the etcd. In a production environment, high availability of ETCD is being ensured by running multiple ETCDs in a cluster. For multinode ETCD clusters, it uses a RAFT consensus protocol to ensure data consistency. In RAFT every node is either candidate, follower or leader. Every leader comes with a term, and the leader is responsible to send its heartbeat to all other nodes. Every node has a random timeout, and it is set above the leader’s heartbeat duration. Every time when a node receives a heartbeat from the leader, it resets the countdown. When a node does not receive the heartbeat and it reaches the timeout state, it promotes itself to the candidate and requests all other nodes to elect it as a leader. When all other nodes also reach a timeout and didn’t receive the heartbeat from the leader either, it usually accepts the candidate node as a leader. Since the countdown system is random it is pretty unlikely to have multiple candidates, but yet theoretically it is possible. In that case, reelection happens. When a maximum number of nodes accept it as a leader, the node is being considered as a leader. During situations like network petition, or a node failure, Etcd works resiliently. Now we are going to explore how etcd stores the data, and how can we back up and restore the data.

If we investigate a ETCD data-dir, we are going to find at least 3 types of files, snap, db and wal.

<data-dir>
| - member
| -- snap
|.     |-000.000.snap
|.     |-000.000.snap
|.     |-db
| -- wal
|.     |-000.000.wal

We are going to talk about all these in the next section but if you are curious to read these logs, the following commands can help you to meet your curiosity:

etcd-dump-logs /var/etcd/data

ETCD uses a write-ahead logging (WAL) mechanism. When a log comes to a node, it forwards this request to the leader, the leader appends this log to its .wal files. Then it tries to replicate it to all other follower nodes. When the majority of nodes have a log, the leader marks that log as committed. It is safe for every other node to apply that log. When it reaches its first n number of logs it stores them in a snapshot. Usually, this n is defined in --snapshot-count variable. Snapshots are persistence keyspace db files and truncate the old db files. Now it can happen that before writing to db files, ETCD has crashed. In that situation, it recreates the db files from backtracking.

Taking backup for restoring etcd is as simple as copying data from this data directory or if we want we can take a snapshot to another file using following command.

ETCDCTL_API=3 etcdctl --endpoints $ENDPOINT snapshot save snapshot.db

We can creates new etcd data directories from the snapshot created above:

etcdctl snapshot restore

It is important to make sure that all members of etcd are restoring using the same snapshot. After restore etcd is going to lose its identity because restore drops some metadata information that deals with cluster and etcd servers. Therefore in order to start a cluster from a snapshot, the restore must start a new logical cluster.

Restore the db to a file:

ETCDCTL_API=3 etcdctl snapshot restore snapshot.db \
  --name m1 \
  --initial-cluster m1=http://host1:2380,m2=http://host2:2380,m3=http://host3:2380 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-advertise-peer-urls http://host1:2380 Run etcd with names and data-dir:
etcd \
  --name m1 \
  --listen-client-urls http://host1:2379 \
  --advertise-client-urls http://host1:2379 \
  --listen-peer-urls http://host1:2380 &

Now that we are aware of how etcd ensures consensus, data replication, and fault tolerance, let’s talk about security. ETCD is basically communicating with 2 different components. It is communicating with its peers and it is communicating with the client. In the case of Kubernetes it is communicating with API servers. For security reasons, both of these communication can be configured to be encrypted using TLS. It can be configured to use a self-signed certificate or we can use our own certificate to do so as well.

For a client, to etcd it needs to have certificate key pairs. Usually, we configure these certificates using this parameters:   --client-cert-auth--trusted-ca-file--cert-file--key-file parameters.

etcd --name infra0 --data-dir infra0 \
--client-cert-auth \
--trusted-ca-file=/path/to/ca.crt \
--cert-file=/path/to/server.crt \
--key-file=/path/to/server.key \
--advertise-client-urls https://127.0.0.1:2379 \
--listen-client-urls https://127.0.0.1:2379

We can simulate a client call using curl request:

curl --cacert /path/to/ca.crt \
--cert /path/to/client.crt \
--key /path/to/client.key \
-L https://127.0.0.1:2379/v2/keys/foo \
-XPUT -d value=bar -v

For etcd peers to peer communication using TLS, we use --peer-client-cert-auth, --peer-trusted-ca-file, --peer-cert-file,--peer-key-file parameters.

$ etcd --name infra2 \
--data-dir infra2 \
--peer-client-cert-auth \
--peer-trusted-ca-file=/path/to/ca.crt \
--peer-cert-file=/path/to/member1.crt \
--peer-key-file=/path/to/member1.key \
--initial-advertise-peer-urls=https://10.0.1.10:2380 \
--listen-peer-urls=https://10.0.1.10:2380 \
--discovery ${DISCOVERY_URL}
# member2
$ etcd --name infra2 \
--data-dir infra2 \
--peer-client-cert-auth \
--peer-trusted-ca-file=/path/to/ca.crt \
--peer-cert-file=/path/to/member2.crt \
--peer-key-file=/path/to/member2.key \
--initial-advertise-peer-urls=https://10.0.1.11:2380 \
--listen-peer-urls=https://10.0.1.11:2380 \
--discovery ${DISCOVERY_URL}

An ideal server would have a combination of both.

We can monitor etcd using`–metrics` flag

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s