#OCPBUGS-54927 | issue | 2 days ago | CI: API is broken in periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-single-node-techpreview-serial CLOSED |
{code:java} : [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available {code} My understanding is that this fails if the "openshift-apiserver" moves out of Available to some other state. If this happens often then it may justify why other things are reporting "connection refused". The last time this test passed was on Feb 11th and it also started to fail later on the same day never to recover, I went run by run and I could not see a single success run after that. | |||
#OCPBUGS-23746 | issue | 3 weeks ago | openshift-apiserver ClusterOperator should not blip Available=False on brief missing HTTP content-type New |
Issue 15637203: openshift-apiserver ClusterOperator should not blip Available=False on brief missing HTTP content-type Description: h2. Description of problem: Seen [in 4.15 update CI|https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade/1727427846533550080]: {code:none} : [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available expand_less Run #0: Failed expand_less 1h28m25s { 1 unexpected clusteroperator state transitions during e2e test run Nov 22 21:47:32.876 - 1s E clusteroperator/openshift-apiserver condition/Available reason/APIServices_Error status/False APIServicesAvailable: rpc error: code = Unknown desc = malformed header: missing HTTP content-type} {code} While the Kube API server, if that's what's missing the header, is supposed to always be available, an issue that only persists for 1s is not long enough to warrant [immediate admin intervention|https://github.com/openshift/api/blob/c3f7566f6ef636bb7cf9549bf47112844285989e/config/v1/types_cluster_operator.go#L149-L153]. Teaching the openshift-apiserver operator to stay {{Available=True}} for this kind of brief hiccup, while still going {{Available=False}} for issues where [least part of the component is non-functional, and that the condition requires immediate administrator intervention|https://github.com/openshift/api/blob/c3f7566f6ef636bb7cf9549bf47112844285989e/config/v1/types_cluster_operator.go#L149-L153] would make it easier for admins and SREs operating clusters to identify when intervention was required. h2. Version-Release number of selected component (if applicable): {code:none} $ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=48h&type=junit&search=clusteroperator/openshift-apiserver+should+not+change+condition/Available' | grep '^periodic-.*4[.]15.*failures match' | sort periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-ibmcloud-ovn-multi-ppc64le (all) - 4 runs, 100% failed, 25% of failures match = 25% impact periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-ibmcloud-ovn-multi-s390x (all) - 4 runs, 25% failed, 200% of failures match = 50% impact periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-nightly-4.14-ocp-ovn-remote-libvirt-s390x (all) - 5 runs, 100% failed, 40% of failures match = 40% impact periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-upgrade-azure-ovn-arm64 (all) - 5 runs, 40% failed, 50% of failures match = 20% impact periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-upgrade-azure-ovn-heterogeneous (all) - 5 runs, 20% failed, 100% of failures match = 20% impact periodic-ci-openshift-release-master-ci-4.15-e2e-aws-ovn-upgrade (all) - 5 runs, 20% failed, 200% of failures match = 40% impact periodic-ci-openshift-release-master-ci-4.15-e2e-aws-upgrade-ovn-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade (all) - 50 runs, 56% failed, 21% of failures match = 12% impact periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-ovn-upgrade (all) - 80 runs, 44% failed, 17% of failures match = 8% impact periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-aws-ovn-upgrade (all) - 80 runs, 30% failed, 13% of failures match = 4% impact periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade (all) - 80 runs, 43% failed, 6% of failures match = 3% impact periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-rt-upgrade (all) - 50 runs, 16% failed, 63% of failures match = 10% impact periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-from-stable-4.13-e2e-aws-sdn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-ovn-single-node-serial (all) - 5 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-ovn-upgrade-rollback-oldest-supported (all) - 5 runs, 40% failed, 50% of failures match = 20% impact periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-sdn-upgrade (all) - 50 runs, 18% failed, 11% of failures match = 2% impact periodic-ci-openshift-release-master-nightly-4.15-e2e-gcp-ovn-etcd-scaling (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.15-e2e-ibmcloud-csi (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-techpreview (all) - 5 runs, 40% failed, 50% of failures match = 20% impact periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-aws-upgrade-ovn-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-metal-ipi-sdn-bm-upgrade (all) - 5 runs, 100% failed, 20% of failures match = 20% impact periodic-ci-openshift-release-master-okd-scos-4.15-e2e-aws-ovn-upgrade (all) - 15 runs, 47% failed, 14% of failures match = 7% impact {code} The impact rates are low enough that I haven't checked older 4.y. And it's possible that some of those matches have the operator going {{Available=False}} for other reasons besides {{APIServices_Error}}: {code:none} $ curl -s 'https://search.ci.openshift.org/search?maxAge=48h&type=junit&name=4.15.*upgrade&context=0&search=clusteroperator/openshift-apiserver.*condition/Available.*status/False' | jq -r 'to_entries[].value | to_entries[].value[].context[]' | sed 's|.*clusteroperator/\([^ ]*\) condition/Available reason/\([^ ]*\) status/False.*|\1 \2|' | sort | uniq -c | sort -n 2 openshift-apiserver APIServerDeployment_NoPod 2 openshift-apiserver APIServerDeployment_PreconditionNotFulfilled 19 openshift-apiserver APIServices_Error 22 openshift-apiserver APIServerDeployment_NoDeployment {code} h2. How reproducible: {{12% impact}} for {{periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade}} looks like the highest impact among the jobs with double-digit run counts. h2. Steps to Reproduce: Run {{periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade}} a bunch of times watching the {{openshift-apiserver}} ClusterOperator's {{Available}} condition. h2. Actual results: Some very brief blips of {{Available=False}} that self-resolve before an admin could possibly resolve to the summons. h2. Expected results: No quickly-resolving blips in CI. No long runs of {{Available=False}} for issues that don't seem worth summoning an admin. Still going {{Available=False}} for outages that need immediate admin response. Status: New | |||
periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-aws-sdn-upgrade (all) - 11 runs, 64% failed, 57% of failures match = 36% impact | |||
#1932242422121631744 | junit | 3 days ago | |
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available 0 unexpected clusteroperator state transitions during e2e test run, as desired. | |||
#1932566022934499328 | junit | 2 days ago | |
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available 0 unexpected clusteroperator state transitions during e2e test run, as desired. | |||
#1930184332580753408 | junit | 9 days ago | |
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available 0 unexpected clusteroperator state transitions during e2e test run, as desired. | |||
#1929976146754015232 | junit | 9 days ago | |
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available 0 unexpected clusteroperator state transitions during e2e test run, as desired. |
Found in 36.36% of runs (57.14% of failures) across 11 total runs and 1 jobs (63.64% failed) in 116ms - clear search | chart view - source code located on github