Alert Reference

Objective

This document provides reference information on various types of alerts supported by F5® Distributed Cloud Services. Use the information provided in this document to understand the details on the various alerts and action required to be performed.

The Alerts page in the F5 Distributed Cloud Console (Console) displays two tabs called Active Alerts and All Alerts.

Active Alerts

Alert is generated when the alert condition is evaluated to true. Alert rules are evaluated periodically, and the alert status remains active as long as the alert condition is active.

Note: There are 2 alert APIs. The Get Alerts API returns active alerts, and the state of alerts will be active. The Alerts History API returns a history of alert notifications for a selected time interval. The status can be firing (which is same as active) or resolved.

The following are some of the keys and their corresponding values for an active alert (can be viewed from the JSON view of an alert in Console):

  • state - The value is active.
  • startsAt – The time at which alert started firing.
  • endsAt – The time at which the alert got resolved (if it is resolved). Ignore this field if the alert status is active or firing.
  • generatorURL – Identifies the entity that generated this alert. This is an internal URL and hence it is always set to “”.
  • silencedBy and inhibitedBy – This is always null and should be ignored.
  • receivers - List of alert receivers this alert notification was sent to, based on the user configured alert policy. This is empty if no alert policy configured or this alert did not match any configured alert policy.
  • fingerprint - fingerprint is a hash of the key-value pairs in the alert, and it uniquely identifies an alert.

All Alerts

The All Alerts tab shows the history of alerts triggered for the selected date and time interval. The following are some of the keys and their corresponding values (can be viewed from the JSON view of an alert in Console):

  • status - An alert can have one of the following values:
    • firing - This is same as active state.
    • resolved - This indicates that the alert is resolved.
  • startsAt – The time at which alert started firing.
  • endsAt – The time at which the alert got resolved (if it is resolved). Ignore this field if the alert status is active or firing.

Key Points

An alert is resolved in the following cases:

  • Alert condition is no longer active.
  • If the alert has valid endsAt time, and it is lapsed.
  • If the alert has no valid endsAt time, and no updates are received for the resolve_timeout duration (15 minutes).

Note: In case of an active alert, you can ignore the endsAt time. The entity generating the alert may set this endsAt time and Alertmanager resolves the alert after this time is lapsed.

There is no separate alert for health score. This is because health score is composed of multiple components. For example, health score of a site is computed based on the data-plane connection status to the Regional Edge (RE) sites, control-plane connection status, and K8s API server status in the Site. There are individual alerts defined for each of the above conditions, but no alert is available for the health score itself.

Note: You can obtain the healthscore of a Site in F5® Distributed Cloud Console (Console). You can also obtain it using the API https://www.volterra.io/docs/api/graph-connectivity#operation/ves.io.schema.graph.connectivity.CustomAPI.NodeQuery with "field_selector":{"healthscore":{"types":["HEALTHSCORE_OVERALL"]}}.

The amount of time before alert generation is not the same for all alerts. This duration is determined based on the severity of the alerts. For example, alert is raised as soon as the tunnel connection to RE goes down, whereas health check alert for a service is raised only if the condition persists for 10 minutes. This is to keep the alert volume under manageable level and not to generate alerts on temporary or transient failure conditions.

It is not supported to change the threshold for alerts.

It is not supported for users to define new alerts using an API. However, in case existing alerts do not satisfy your requirement, you can create a support request for new alert in Console.

Alerts and Descriptions

The following table presents alerts and associated details such as group, type, severity, and associated actions.

TSA Severity vs Anomaly Scores

Time-Series Anomaly (TSA) alerts are generated when the anomaly detection algorithm determines anomalies in any one of the following metrics:

  • Request rate
  • Error Rate
  • Response Throughput
  • Request Throughput
  • Response Latency

Note: The metrics are evaluated in requsts per second (rps), errors per second (erps), seconds (s), and Megabits per second (Mbps).

The alerts are classified into 3 groups (minor, major and critical) based on the severity. The minimum/absolute thresholds for the metrics to trigger these alerts are provided in the following table.

MetricSeverityScoreAbsolute ThresholdAlert
Request Rateminor0.65 rpsRequestRateAnomaly
Request Ratemajor1.550 rpsRequestRateAnomaly
Request Ratecritical3.0100 rpsRequestRateAnomaly
Request Throughputminor0.60.25 MbpsRequestThroughputAnomaly
Request Throughputmajor1.52.5 MbpsRequestThroughputAnomaly
Request Throughputcritical3.05 MbpsRequestThroughputAnomaly
Response Throughputminor0.62.5 MbpsResponseThroughputAnomaly
Response Throughputmajor1.525 MbpsResponseThroughputAnomaly
Response Throughputcritical3.050 MbpsResponseThroughputAnomaly
Response Latencyminor0.65 sResponseLatencyAnomaly
Response Latencymajor1.550 sResponseLatencyAnomaly
Response Latencycritical3.0100 sResponseLatencyAnomaly
Error Rateminor0.62.5 erpsErrorRateAnomaly
Error Ratemajor1.525 erpsErrorRateAnomaly
Error Ratecritical3.050 erpsErrorRateAnomaly

Note: For more information on the TSA, see Time-Series Anomaly Dectection guide.

In case of L7 DDoS event, the minimum thresholds are similar to the absolute thresholds defined for critical TSA alerts. That is, for an L7 DDoS event, the following are the minimum thresholds defined for metrics and are not configurable by end users:

  • Request Rate - 100 rps
  • Error Rate - 50 erps
  • Request Throughput - 5 Mbps
  • Response Throughout - 50 Mbps
  • Response Latency - 100 s