Alert Reference
Objective
This document provides reference information on various types of alerts supported by F5® Distributed Cloud Services. Use the information provided in this document to understand the details on the various alerts and action required to be performed.
The Alerts page in the F5 Distributed Cloud Console (Console) displays two tabs called Active Alerts and All Alerts.
Active Alerts
Alert is generated when the alert condition is evaluated to true. Alert rules are evaluated periodically, and the alert status remains active as long as the alert condition is active.
Note: There are 2 alert APIs. The Get Alerts API returns active alerts, and the
state
of alerts will beactive
. The Alerts History API returns a history of alert notifications for a selected time interval. The status can befiring
(which is same as active) orresolved
.
The following are some of the keys and their corresponding values for an active alert (can be viewed from the JSON view of an alert in Console):
state
- The value isactive
.startsAt
– The time at which alert started firing.endsAt
– The time at which the alert got resolved (if it is resolved). Ignore this field if the alert status is active or firing.generatorURL
– Identifies the entity that generated this alert. This is an internal URL and hence it is always set to“”
.silencedBy
andinhibitedBy
– This is always null and should be ignored.receivers
- List of alert receivers this alert notification was sent to, based on the user configured alert policy. This is empty if no alert policy configured or this alert did not match any configured alert policy.fingerprint
- fingerprint is a hash of the key-value pairs in the alert, and it uniquely identifies an alert.
All Alerts
The All Alerts
tab shows the history of alerts triggered for the selected date and time interval. The following are some of the keys and their corresponding values (can be viewed from the JSON view of an alert in Console):
status
- An alert can have one of the following values:firing
- This is same asactive
state.resolved
- This indicates that the alert is resolved.
startsAt
– The time at which alert started firing.endsAt
– The time at which the alert got resolved (if it is resolved). Ignore this field if the alert status is active or firing.
Key Points
An alert is resolved in the following cases:
- Alert condition is no longer active.
- If the alert has valid
endsAt
time, and it is lapsed. - If the alert has no valid
endsAt
time, and no updates are received for theresolve_timeout
duration (15 minutes).
Note: In case of an active alert, you can ignore the
endsAt
time. The entity generating the alert may set thisendsAt
time and Alertmanager resolves the alert after this time is lapsed.
There is no separate alert for health score. This is because health score is composed of multiple components. For example, health score of a site is computed based on the data-plane connection status to the Regional Edge (RE) sites, control-plane connection status, and K8s API server status in the Site. There are individual alerts defined for each of the above conditions, but no alert is available for the health score itself.
Note: You can obtain the healthscore of a Site in F5® Distributed Cloud Console (Console). You can also obtain it using the API
https://www.volterra.io/docs/api/graph-connectivity#operation/ves.io.schema.graph.connectivity.CustomAPI.NodeQuery
with"field_selector":{"healthscore":{"types":["HEALTHSCORE_OVERALL"]}}
.
The amount of time before alert generation is not the same for all alerts. This duration is determined based on the severity of the alerts. For example, alert is raised as soon as the tunnel connection to RE goes down, whereas health check alert for a service is raised only if the condition persists for 10 minutes. This is to keep the alert volume under manageable level and not to generate alerts on temporary or transient failure conditions.
It is not supported to change the threshold for alerts.
It is not supported for users to define new alerts using an API. However, in case existing alerts do not satisfy your requirement, you can create a support request for new alert in Console.
Alerts and Descriptions
The following table presents alerts and associated details such as group, type, severity, and associated actions.
TSA Severity vs Anomaly Scores
Time-Series Anomaly (TSA) alerts are generated when the anomaly detection algorithm determines anomalies in any one of the following metrics:
- Request rate
- Error Rate
- Response Throughput
- Request Throughput
- Response Latency
Note: The metrics are evaluated in requsts per second (rps), errors per second (erps), seconds (s), and Megabits per second (Mbps).
The alerts are classified into 3 groups (minor, major and critical) based on the severity. The minimum/absolute thresholds for the metrics to trigger these alerts are provided in the following table.
Metric | Severity | Score | Absolute Threshold | Alert |
---|---|---|---|---|
Request Rate | minor | 0.6 | 5 rps | RequestRateAnomaly |
Request Rate | major | 1.5 | 50 rps | RequestRateAnomaly |
Request Rate | critical | 3.0 | 100 rps | RequestRateAnomaly |
Request Throughput | minor | 0.6 | 0.25 Mbps | RequestThroughputAnomaly |
Request Throughput | major | 1.5 | 2.5 Mbps | RequestThroughputAnomaly |
Request Throughput | critical | 3.0 | 5 Mbps | RequestThroughputAnomaly |
Response Throughput | minor | 0.6 | 2.5 Mbps | ResponseThroughputAnomaly |
Response Throughput | major | 1.5 | 25 Mbps | ResponseThroughputAnomaly |
Response Throughput | critical | 3.0 | 50 Mbps | ResponseThroughputAnomaly |
Response Latency | minor | 0.6 | 5 s | ResponseLatencyAnomaly |
Response Latency | major | 1.5 | 50 s | ResponseLatencyAnomaly |
Response Latency | critical | 3.0 | 100 s | ResponseLatencyAnomaly |
Error Rate | minor | 0.6 | 2.5 erps | ErrorRateAnomaly |
Error Rate | major | 1.5 | 25 erps | ErrorRateAnomaly |
Error Rate | critical | 3.0 | 50 erps | ErrorRateAnomaly |
Note: For more information on the TSA, see Time-Series Anomaly Dectection guide.
In case of L7 DDoS event, the minimum thresholds are similar to the absolute thresholds defined for critical
TSA alerts. That is, for an L7 DDoS event, the following are the minimum thresholds defined for metrics and are not configurable by end users:
- Request Rate - 100 rps
- Error Rate - 50 erps
- Request Throughput - 5 Mbps
- Response Throughout - 50 Mbps
- Response Latency - 100 s