Job:
#OCPBUGS-54927issue3 days agoCI: API is broken in periodic-ci-openshift-release-master-nightly-4.19-e2e-aws-ovn-single-node-techpreview-serial CLOSED
{code:java}
: [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available {code}
My understanding is that this fails if the "openshift-apiserver" moves out of Available to some other state. If this happens often then it may justify why other things are reporting "connection refused". The last time this test passed was on Feb 11th and it also started to fail later on the same day never to recover, I went run by run and I could not see a single success run after that.
#OCPBUGS-23746issue3 weeks agoopenshift-apiserver ClusterOperator should not blip Available=False on brief missing HTTP content-type New
Issue 15637203: openshift-apiserver ClusterOperator should not blip Available=False on brief missing HTTP content-type
Description: h2. Description of problem:
 
 Seen [in 4.15 update CI|https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade/1727427846533550080]:
 {code:none}
 : [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available expand_less
 Run #0: Failed expand_less	1h28m25s
 {  1 unexpected clusteroperator state transitions during e2e test run 
 
 Nov 22 21:47:32.876 - 1s    E clusteroperator/openshift-apiserver condition/Available reason/APIServices_Error status/False APIServicesAvailable: rpc error: code = Unknown desc = malformed header: missing HTTP content-type}
 {code}
 While the Kube API server, if that's what's missing the header, is supposed to always be available, an issue that only persists for 1s is not long enough to warrant [immediate admin intervention|https://github.com/openshift/api/blob/c3f7566f6ef636bb7cf9549bf47112844285989e/config/v1/types_cluster_operator.go#L149-L153]. Teaching the openshift-apiserver operator to stay {{Available=True}} for this kind of brief hiccup, while still going {{Available=False}} for issues where [least part of the component is non-functional, and that the condition requires immediate administrator intervention|https://github.com/openshift/api/blob/c3f7566f6ef636bb7cf9549bf47112844285989e/config/v1/types_cluster_operator.go#L149-L153] would make it easier for admins and SREs operating clusters to identify when intervention was required.
 h2. Version-Release number of selected component (if applicable):
 {code:none}
 $ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=48h&type=junit&search=clusteroperator/openshift-apiserver+should+not+change+condition/Available' | grep '^periodic-.*4[.]15.*failures match' | sort
 periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-ibmcloud-ovn-multi-ppc64le (all) - 4 runs, 100% failed, 25% of failures match = 25% impact
 periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-ibmcloud-ovn-multi-s390x (all) - 4 runs, 25% failed, 200% of failures match = 50% impact
 periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-nightly-4.14-ocp-ovn-remote-libvirt-s390x (all) - 5 runs, 100% failed, 40% of failures match = 40% impact
 periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-upgrade-azure-ovn-arm64 (all) - 5 runs, 40% failed, 50% of failures match = 20% impact
 periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-upgrade-azure-ovn-heterogeneous (all) - 5 runs, 20% failed, 100% of failures match = 20% impact
 periodic-ci-openshift-release-master-ci-4.15-e2e-aws-ovn-upgrade (all) - 5 runs, 20% failed, 200% of failures match = 40% impact
 periodic-ci-openshift-release-master-ci-4.15-e2e-aws-upgrade-ovn-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
 periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade (all) - 50 runs, 56% failed, 21% of failures match = 12% impact
 periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-ovn-upgrade (all) - 80 runs, 44% failed, 17% of failures match = 8% impact
 periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-aws-ovn-upgrade (all) - 80 runs, 30% failed, 13% of failures match = 4% impact
 periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade (all) - 80 runs, 43% failed, 6% of failures match = 3% impact
 periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-rt-upgrade (all) - 50 runs, 16% failed, 63% of failures match = 10% impact
 periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-from-stable-4.13-e2e-aws-sdn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
 periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-ovn-single-node-serial (all) - 5 runs, 100% failed, 100% of failures match = 100% impact
 periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-ovn-upgrade-rollback-oldest-supported (all) - 5 runs, 40% failed, 50% of failures match = 20% impact
 periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-sdn-upgrade (all) - 50 runs, 18% failed, 11% of failures match = 2% impact
 periodic-ci-openshift-release-master-nightly-4.15-e2e-gcp-ovn-etcd-scaling (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
 periodic-ci-openshift-release-master-nightly-4.15-e2e-ibmcloud-csi (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
 periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-techpreview (all) - 5 runs, 40% failed, 50% of failures match = 20% impact
 periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-aws-upgrade-ovn-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
 periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-metal-ipi-sdn-bm-upgrade (all) - 5 runs, 100% failed, 20% of failures match = 20% impact
 periodic-ci-openshift-release-master-okd-scos-4.15-e2e-aws-ovn-upgrade (all) - 15 runs, 47% failed, 14% of failures match = 7% impact
 {code}
 
 The impact rates are low enough that I haven't checked older 4.y.  And it's possible that some of those matches have the operator going {{Available=False}} for other reasons besides {{APIServices_Error}}:
 
 {code:none}
 $ curl -s 'https://search.ci.openshift.org/search?maxAge=48h&type=junit&name=4.15.*upgrade&context=0&search=clusteroperator/openshift-apiserver.*condition/Available.*status/False' | jq -r 'to_entries[].value | to_entries[].value[].context[]' | sed 's|.*clusteroperator/\([^ ]*\) condition/Available reason/\([^ ]*\) status/False.*|\1 \2|' | sort | uniq -c | sort -n
       2 openshift-apiserver APIServerDeployment_NoPod
       2 openshift-apiserver APIServerDeployment_PreconditionNotFulfilled
      19 openshift-apiserver APIServices_Error
      22 openshift-apiserver APIServerDeployment_NoDeployment
 {code}
 
 h2. How reproducible:
 
 {{12% impact}} for {{periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade}} looks like the highest impact among the jobs with double-digit run counts.
 
 h2. Steps to Reproduce:
 
 Run {{periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade}} a bunch of times watching the {{openshift-apiserver}} ClusterOperator's {{Available}} condition.
 
 h2. Actual results:
 
 Some very brief blips of {{Available=False}} that self-resolve before an admin could possibly resolve to the summons.
 
 h2. Expected results:
 
 No quickly-resolving blips in CI.  No long runs of {{Available=False}} for issues that don't seem worth summoning an admin.  Still going {{Available=False}} for outages that need immediate admin response.
Status: New
periodic-ci-openshift-release-master-nightly-4.18-upgrade-from-stable-4.17-e2e-aws-upgrade-ovn-single-node (all) - 10 runs, 40% failed, 525% of failures match = 210% impact
#1932862974611951616junit43 hours ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.
#1932745529918230528junit2 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.
#1932639708521697280junit2 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.
#1932213305225515008junit3 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.
#1932344289916882944junit3 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.
#1932517587367759872junit2 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.
#1931917116164804608junit4 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.
#1931247431614205952junit6 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.
#1930982706502438912junit7 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.
#1931077453434851328junit6 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.
#1930808875297017856junit7 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.
#1930529790083731456junit8 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.
#1930413233466773504junit8 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.
#1930634813128052736junit7 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.
#1930305120684216320junit8 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.
#1929937085393801216junit9 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.
#1930186305644269568junit9 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.
#1930079949314592768junit9 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.
#1929544514171572224junit10 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.
#1929707116285661184junit10 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.
#1928479502468386816junit13 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.

Found in 210.00% of runs (525.00% of failures) across 10 total runs and 1 jobs (40.00% failed) in 92ms - clear search | chart view - source code located on github