Customer Edge Registration and Upgrade Reference

Published April 5, 2023 | Last modified December 13, 2024

Objective

This reference guide provides information on how an F5 Distributed Cloud Customer Edge (CE) Site registers with the Distributed Cloud Global Controller (GC). It also provides details about CE node provisioning and upgrades.

Introduction

F5 Distributed Cloud CE sites are extensions of F5 Distributed Cloud that are deployed in customer premises, public clouds, on-premises data centers, or edge locations. As these CE sites are managed centrally through F5 Distributed Cloud Console, they register with F5 Distributed Cloud when they come online. These CE sites also need software upgrades from time to time. This reference guide goes into the details of how CE registration works, how CE node provisioning happens, and how CE node upgrades work.

Prerequisites

Read the following documents before continuing further in this guide:

F5 Distributed Cloud - Customer Edge: This document provides details about what a CE is and how it operates.
Create Secure Mesh Site v2: This document provides instructions for creating a CE Site using the Secure Mesh Site v2 workflow.

CE Node Pre-Registration

Once CE Site (using Secure Mesh Site v2 workflow) configuration is saved by a user, in the F5 Distributed Cloud Console, the following steps happen in the background:

Based on the parameters configured in F5 Distributed Cloud Console, Secure Mesh Site v2 is created in F5 Distributed Cloud Global Controller (GC).
As part of generating a node token step, the JWT (JSON Web Token) is checked out. This token includes the site name, tenant information, and much more, in the claim. This is a short-lived token with 24-hour validity to start the registration process.
The JWT token is provided to the CE node that is to be registered via:
- cloud-init (for AWS, Azure, GCP, OpenStack, KVM, OpenShift, and OCI)
- OVA template for VMware
- For baremetal sites, the provisioning starts with use of the SiteCLI

CE Node Initialization and Registration

Depending on the provider, when a node comes up, cloud-init is executed to ensure that token information is saved at /etc/vpm/user_data location in the node. Note that if the node is a VMware node, cloud-init is not used.
Platform Management Service is one of the primary services that is started when the node is booted. A copy of this service is already present in the image that is brought up.
When Platform Management Service starts, it extracts the token information from /etc/vpm/user_data if present, or from VMware guestinfo plugin (in case of VMware).
The token gives the following information to the Platform Management Service:
- Site name
- Tenant
- DRP address
- Static IP/Gateway, if any
- DNS, if any
- Node Registration Service endpoint
The Platform Management Service selects the lowest-named interface as the SLO (Site Local Outside) interface by default and either enables DHCP (default) or static IP (if the SLO IP address is present in the token).
The Platform Management Service also sets up a default DNS server at this point if SLO DNS is present in the token.
The Platform Management Service collects node information, such as its version, CPU, memory, disk, number of interfaces (and more), and sends it a registration request to Node Registration Service (in the GC) using a public REST API.
The Node Registration Service will validate that the JWT token presented is valid. If the token is valid, it will check the version of Platform Management Service in the message and compare it to the Platform Management Service version expected for the site. Generally, the first request by the node will be with the default Platform Management Service which will not match that of the site’s version and hence the registration will be rejected with the expected Platform Management Service version.
The Platform Management Service, on seeing the expected Platform Management Service version, will update the node to download the right Platform Management Service version and restart the registration process.
The Node Registration Service will accept the next registration request as it has the right Platform Management Service version. At this point, Node Registration Service will create a registration object representing the node.
If this is a High Availability (HA) enabled site, Node Registration Service will wait for at least three (3) nodes to be present before the manifests are generated for the nodes. The first three nodes to register will be considered the control nodes. The other nodes will be treated as worker nodes.
If the site is not HA-enabled, the first node is considered the control node, and the other nodes are worker nodes. In this case, Node Registration Service will send the manifests without waiting for additional nodes.
The Platform Management Service will re-try workload requests to Node Registration Service until it gets the manifests. Manifests include information on the services/apps to be deployed. The application of these manifests is described more in detail in the next section as Upgrade workflow.

CE Node Upgrade

During the initial node bring-up process and in a steady state, the Platform Management Service periodically sends a workload request to the Node Registration Service.
The Node Registration Service will send the latest version of manifests for the site to the Platform Management Service.
If the Platform Management Service finds any delta in the overall hash of the manifests (or if there were no manifests in the node to begin with), it will enter the upgrade process.
The Platform Management Service has the following stages in the upgrade process, in the order given below:
- Updates Platform Management Service itself
- Checks and upgrades the OS if required. In case of a multi-node CE site, the OS upgrade lock is taken one node at a time to update all the nodes.
- Brings up etcd on the node, which enables quorum selection as well as maintaining state.
- Creates the bootstrap configuration for data plane service
- Starts Kubernetes deployment, including kubelet
- Master election is done to apply subsequent workloads
- Brings up Kubernetes fully
- In case of orchestrated sites, cloud provider integrations are done, which includes validating the secret/authorization
- Before any further pods are brought up, the local certificate management service is started to manage local certificates.
- Starts with image pre-pull for services listed in pre-pull. Once the images specified in the pre-pull are downloaded, the actual manifest application starts. Manifests are applied in the order in which the stage number is specified by the applications. With the node drain feature, this will be done node by node with control nodes being upgraded last. This includes all the components, including VER.
- Manages updates to cloud storage settings, including migration activity (last migration activity was done in March 2024 software version).
- Finally, with all services deployed, the ongoing objects are applied to data plane service components. At the end of this stage, the Platform Management Service checks if all manifests are applied correctly and reports upgrade completion information to the Node Registration Service. This allows for any subsequent upgrade to be triggered.
The Platform Management Service handles many of the error conditions during the above steps with appropriate retries. In other words, if cri-o fails to deploy, it will stop and start the service again. If an application, such as etcd fails, it will retry. It checks for deployment failures of different services to try appropriate error handling. It, however, does not manage the service health of all components. Each service needs to have its liveness/readiness probes defined appropriately to detect service-specific failure conditions so that Kubernetes can restart the service containers appropriately.