Job:
#OCPBUGS-30267issue3 weeks ago[IBMCloud] MonitorTests liveness/readiness probe error events repeat MODIFIED
Mar 12 18:52:24.937 - 58s E namespace/openshift-kube-apiserver alert/KubeAPIErrorBudgetBurn alertstate/firing severity/critical ALERTS
{alertname="KubeAPIErrorBudgetBurn", alertstate="firing", long="1h", namespace="openshift-kube-apiserver", prometheus="openshift-monitoring/k8s", severity="critical", short="5m"}
#OCPBUGS-55389issue3 weeks agoalert/KubeAPIErrorBudgetBurn should not be at or above info New
Issue 16854930: alert/KubeAPIErrorBudgetBurn should not be at or above info
Description: Description of problem:
 {code:none}
 While working on making IPsec upgrade as a mandatory jobs for OCP releases with PR: https://github.com/openshift/release/pull/63528, KubeAPIErrorBudgetBurn alert is seen for the 4.17 and 4.16 rehearsal jobs. Looks like this is seen sporadically.
 
 : [bz-kube-apiserver][invariant] alert/KubeAPIErrorBudgetBurn should not be at or above info expand_less0s{  KubeAPIErrorBudgetBurn was at or above info for at least 4m58s on platformidentification.JobType{Release:"4.17", FromRelease:"4.17", Platform:"aws", Architecture:"amd64", Network:"ovn", Topology:"ha"} (maxAllowed=0s): pending for 1h35m32s, firing for 4m58s:
 
 Apr 25 14:27:34.665 - 298s  E namespace/openshift-kube-apiserver alert/KubeAPIErrorBudgetBurn alertstate/firing severity/critical ALERTS{alertname="KubeAPIErrorBudgetBurn", alertstate="firing", long="6h", namespace="openshift-kube-apiserver", prometheus="openshift-monitoring/k8s", severity="critical", short="30m"}}
 
 https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/63528/rehearse-63528-pull-ci-openshift-cluster-network-operator-release-4.16-e2e-aws-ovn-ipsec-upgrade/1915748137436188672
 
 https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_release/63528/rehearse-63528-pull-ci-openshift-cluster-network-operator-release-4.17-e2e-aws-ovn-ipsec-upgrade/1915748145673801728{code}
 Version-Release number of selected component (if applicable):
 {code:none}
     {code}
 How reproducible:
 {code:none}
     {code}
 Steps to Reproduce:
 {code:none}
     1.
     2.
     3.
     {code}
 Actual results:
 {code:none}
     {code}
 Expected results:
 {code:none}
     {code}
 Additional info:
 {code:none}
 Reference: https://issues.redhat.com/browse/OCPBUGS-42083   {code}
Status: New
#OCPBUGS-49764issue3 weeks agoKubeAPIErrorBudgetBurn calculation is erroneous CLOSED
Issue 16638954: KubeAPIErrorBudgetBurn calculation is erroneous
Description: The problem that I recently noticed with the existing expression is that when we compute the overall burnrate from write and read requests, we take the ratio of successful read requests and we sum it to the one of write requests. But both of these ratios are calculated against their relevant request type, not the total number of requests. This is only correct when the proportion of write and read requests is equal.
 
 For example, let's imagine a scenario where 40% of requests are write requests and their success during a disruption is only 50%. Whilst for read requests we have 90% of success.
 
 apiserver_request:burnrate1h{verb="write"} would be equal to 2/4 and apiserver_request:burnrate1h{verb="read"} would be 1/6.
 The sum of these as these by the alert today would be equal to 2/4+1/6=2/3 when in reality, the ratio of successful requests should be 2/10*1/10=3/10. So there is quite a huge difference today when we don't account for the total number of requests.
      "state": "inactive",
      "name": "KubeAPIErrorBudgetBurn",
      "query": "sum:apiserver_request:burnrate1h > (14.4 * 0.01) and sum:apiserver_request:burnrate5m > (14.4 * 0.01)",
        "description": "The API server is burning too much error budget. This alert fires when too many requests are failing with high latency. Use the 'API Performance' monitoring dashboards to narrow down the request states and latency. The 'etcd' monitoring dashboards also provides metrics to help determine etcd stability and performance.",
        "runbook_url": "https://github.com/openshift/runbooks/blob/master/alerts/cluster-kube-apiserver-operator/KubeAPIErrorBudgetBurn.md",
        "summary": "The API server is burning too much error budget."
      "state": "inactive",
      "name": "KubeAPIErrorBudgetBurn",
      "query": "sum:apiserver_request:burnrate6h > (6 * 0.01) and sum:apiserver_request:burnrate30m > (6 * 0.01)",
        "description": "The API server is burning too much error budget. This alert fires when too many requests are failing with high latency. Use the 'API Performance' monitoring dashboards to narrow down the request states and latency. The 'etcd' monitoring dashboards also provides metrics to help determine etcd stability and performance.",
        "runbook_url": "https://github.com/openshift/runbooks/blob/master/alerts/cluster-kube-apiserver-operator/KubeAPIErrorBudgetBurn.md",
        "summary": "The API server is burning too much error budget."
      "state": "inactive",
      "name": "KubeAPIErrorBudgetBurn",
      "query": "sum:apiserver_request:burnrate1d > (3 * 0.01) and sum:apiserver_request:burnrate2h > (3 * 0.01)",
#OCPBUGS-15430issue13 days agoKubeAPIDown alert rename and/or degraded status ASSIGNED
We have many guards making sure that there are always at least two instances of the kube-apiserver. If we ever reach a single kube-apiserver and it causes disruption for the clients, other alerts such as KubeAPIErrorBudgetBurn will fire.
KubeAPIDown is here to make sure that Prometheus and really any client can reach the kube-apiserver, which they can even when there is only one instance of kube-apiserver running. If they can't or that availability is disrupted, `KubeAPIErrorBudgetBurn` will fire.
Comment 23058588 by Marcel Härri at 2023-09-19T06:57:07.949+0000
periodic-ci-openshift-release-master-ci-4.12-e2e-gcp-sdn-upgrade (all) - 13 runs, 62% failed, 38% of failures match = 23% impact
#1941274147610955776junit4 days ago
# [bz-kube-apiserver][invariant] alert/KubeAPIErrorBudgetBurn should not be at or above pending
KubeAPIErrorBudgetBurn was at or above pending for at least 1m8s on platformidentification.JobType{Release:"4.12", FromRelease:"4.12", Platform:"gcp", Architecture:"amd64", Network:"sdn", Topology:"ha"} (maxAllowed=0s): pending for 1m8s, firing for 0s:
Jul 04 23:52:38.678 - 68s   I alert/KubeAPIErrorBudgetBurn ns/openshift-kube-apiserver ALERTS{alertname="KubeAPIErrorBudgetBurn", alertstate="pending", long="3d", namespace="openshift-kube-apiserver", prometheus="openshift-monitoring/k8s", severity="warning", short="6h"}
#1941237604577972224junit4 days ago
# [bz-kube-apiserver][invariant] alert/KubeAPIErrorBudgetBurn should not be at or above pending
KubeAPIErrorBudgetBurn was at or above pending for at least 7m44s on platformidentification.JobType{Release:"4.12", FromRelease:"4.12", Platform:"gcp", Architecture:"amd64", Network:"sdn", Topology:"ha"} (maxAllowed=0s): pending for 7m44s, firing for 0s:
Jul 04 21:27:06.686 - 464s  I alert/KubeAPIErrorBudgetBurn ns/openshift-kube-apiserver ALERTS{alertname="KubeAPIErrorBudgetBurn", alertstate="pending", long="3d", namespace="openshift-kube-apiserver", prometheus="openshift-monitoring/k8s", severity="warning", short="6h"}
#1940274687342809088junit7 days ago
# [bz-kube-apiserver][invariant] alert/KubeAPIErrorBudgetBurn should not be at or above pending
KubeAPIErrorBudgetBurn was at or above pending for at least 38m8s on platformidentification.JobType{Release:"4.12", FromRelease:"4.12", Platform:"gcp", Architecture:"amd64", Network:"sdn", Topology:"ha"} (maxAllowed=0s): pending for 38m8s, firing for 0s:
Jul 02 05:46:04.403 - 230s  I alert/KubeAPIErrorBudgetBurn ns/openshift-kube-apiserver ALERTS{alertname="KubeAPIErrorBudgetBurn", alertstate="pending", long="1d", namespace="openshift-kube-apiserver", prometheus="openshift-monitoring/k8s", severity="warning", short="2h"}
Jul 02 05:46:04.403 - 2030s I alert/KubeAPIErrorBudgetBurn ns/openshift-kube-apiserver ALERTS{alertname="KubeAPIErrorBudgetBurn", alertstate="pending", long="3d", namespace="openshift-kube-apiserver", prometheus="openshift-monitoring/k8s", severity="warning", short="6h"}
Jul 02 06:21:26.403 - 28s   I alert/KubeAPIErrorBudgetBurn ns/openshift-kube-apiserver ALERTS{alertname="KubeAPIErrorBudgetBurn", alertstate="pending", long="3d", namespace="openshift-kube-apiserver", prometheus="openshift-monitoring/k8s", severity="warning", short="6h"}

Found in 23.08% of runs (37.50% of failures) across 13 total runs and 1 jobs (61.54% failed) in 84ms - clear search | chart view - source code located on github