Job:
#OCPBUGS-62517issue14 hours agoClusterOperator olm goes Available=False with reason=CatalogdDeploymentCatalogdControllerManager_Deploying or reason=OperatorcontrollerDeploymentOperatorControllerControllerManager_Deploying during updates POST
Issue 17438850: ClusterOperator olm goes Available=False with reason=CatalogdDeploymentCatalogdControllerManager_Deploying or reason=OperatorcontrollerDeploymentOperatorControllerControllerManager_Deploying during updates
Description: Description of problem:
 
 [A component must not report Available=False during the course of a normal upgrade.|https://github.com/openshift/api/blob/7f245291a17ac0bd31cf8ba08530c3355b86dbea/config/v1/types_cluster_operator.go#L156]
 
 ClusterOperator olm goes Available=False with reason=CatalogdDeploymentCatalogdControllerManager_Deploying or reason=OperatorcontrollerDeploymentOperatorControllerControllerManager_Deploying during updates
 
 Example job: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.21-e2e-gcp-ovn-upgrade/1972489796022439936
 {code:none}
    Sep 29 04:35:47.504 E clusteroperator/olm condition/Available reason/CatalogdDeploymentCatalogdControllerManager_Deploying status/False CatalogdDeploymentCatalogdControllerManagerAvailable: Waiting for Deployment
 Sep 29 04:35:47.504 - 52s   E clusteroperator/olm condition/Available reason/CatalogdDeploymentCatalogdControllerManager_Deploying status/False CatalogdDeploymentCatalogdControllerManagerAvailable: Waiting for Deployment
 Sep 29 04:42:35.127 E clusteroperator/olm condition/Available reason/OperatorcontrollerDeploymentOperatorControllerControllerManager_Deploying status/False OperatorcontrollerDeploymentOperatorControllerControllerManagerAvailable: Waiting for Deployment
 Sep 29 04:42:35.127 - 12s   E clusteroperator/olm condition/Available reason/OperatorcontrollerDeploymentOperatorControllerControllerManager_Deploying status/False OperatorcontrollerDeploymentOperatorControllerControllerManagerAvailable: Waiting for Deployment
  {code}
 Version-Release number of selected component (if applicable):
 
 The issue was spotted with a 4.21 to 4.21 upgrade test.
 {code:none}
     INFO[2025-09-29T02:33:17Z] Using explicitly provided pull-spec for release initial (registry.ci.openshift.org/ocp/release:4.21.0-0.ci-2025-09-28-082535) INFO[2025-09-29T02:33:17Z] Using explicitly provided pull-spec for release latest (registry.ci.openshift.org/ocp/release:4.21.0-0.ci-2025-09-29-022535) {code}
 How reproducible:
 
 Seems always in [the aggregated job|https://prow.ci.openshift.org/view/gs/test-platform-results/logs/aggregated-gcp-ovn-upgrade-4.21-micro-release-openshift-release-analysis-aggregator/1972561250676117504]  but there is also [a green run|https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/30308/pull-ci-openshift-origin-main-e2e-gcp-ovn-upgrade/1971564973029068800] in a similar test.
 {code:none}
 ### failure
 $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.21-e2e-gcp-ovn-upgrade/1972489796022439936/artifacts/e2e-gcp-ovn-upgrade/openshift-e2e-test/artifacts/junit/e2e-monitor-tests__20250929-034333.xml | grep 'clusteroperator/olm should not change condition/Available' -A1
     <testcase name="[Monitor:legacy-cvo-invariants][bz-OLM] clusteroperator/olm should not change condition/Available" time="7014.05639286">
         <failure message="">4 unexpected clusteroperator state transitions during e2e test run.  These did not match any known exceptions, so they cause this test-case to fail:&#xA;&#xA;Sep 29 04:35:47.504 E clusteroperator/olm condition/Available reason/CatalogdDeploymentCatalogdControllerManager_Deploying status/False CatalogdDeploymentCatalogdControllerManagerAvailable: Waiting for Deployment&#xA;Sep 29 04:35:47.504 - 52s   E clusteroperator/olm condition/Available reason/CatalogdDeploymentCatalogdControllerManager_Deploying status/False CatalogdDeploymentCatalogdControllerManagerAvailable: Waiting for Deployment&#xA;Sep 29 04:42:35.127 E clusteroperator/olm condition/Available reason/OperatorcontrollerDeploymentOperatorControllerControllerManager_Deploying status/False OperatorcontrollerDeploymentOperatorControllerControllerManagerAvailable: Waiting for Deployment&#xA;Sep 29 04:42:35.127 - 12s   E clusteroperator/olm condition/Available reason/OperatorcontrollerDeploymentOperatorControllerControllerManager_Deploying status/False OperatorcontrollerDeploymentOperatorControllerControllerManagerAvailable: Waiting for Deployment&#xA;&#xA;2 unwelcome but acceptable clusteroperator state transitions during e2e test run.  These should not happen, but because they are tied to exceptions, the fact that they did happen is not sufficient to cause this test-case to fail:&#xA;&#xA;Sep 29 04:36:39.932 W clusteroperator/olm condition/Available reason/AsExpected status/True CatalogdDeploymentCatalogdControllerManagerAvailable: Deployment is available\nOperatorcontrollerDeploymentOperatorControllerControllerManagerAvailable: Deployment is available (exception: Available=True is the happy case)&#xA;Sep 29 04:42:48.072 W clusteroperator/olm condition/Available reason/AsExpected status/True CatalogdDeploymentCatalogdControllerManagerAvailable: Deployment is available\nOperatorcontrollerDeploymentOperatorControllerControllerManagerAvailable: Deployment is available (exception: Available=True is the happy case)&#xA;</failure>
 
 ### success
 $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/30308/pull-ci-openshift-origin-main-e2e-gcp-ovn-upgrade/1971564973029068800/artifacts/e2e-gcp-ovn-upgrade/openshift-e2e-test/artifacts/junit/e2e-monitor-tests__20250926-142805.xml | grep 'clusteroperator/olm should not change condition/Available' -A1
     <testcase name="[Monitor:legacy-cvo-invariants][bz-OLM] clusteroperator/olm should not change condition/Available" time="0"></testcase>
     <testcase name="[Monitor:legacy-cvo-invariants][bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available" time="0"></testcase>{code}
 Steps to Reproduce:
 {code:none}
     1. Run the aggregated job above
     2.
     3.
     {code}
 Actual results:
 {code:none}
 co/olm goes Available=True during the upgrade test.{code}
 Expected results:
 {code:none}
 co/olm stays Available=True during the upgrade test.{code}
 Additional info:
 {code:none}
 The failures were taken from 4.21 to 4.21 upgrade test. It could go with earlier versions too.{code}
Status: POST
#OCPBUGS-23746issue7 days agoopenshift-apiserver ClusterOperator should not blip Available=False on brief missing HTTP content-type POST
Issue 15637203: openshift-apiserver ClusterOperator should not blip Available=False on brief missing HTTP content-type
Description: h2. Description of problem:
 
 Seen [in 4.15 update CI|https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade/1727427846533550080]:
 {code:none}
 : [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available expand_less
 Run #0: Failed expand_less	1h28m25s
 {  1 unexpected clusteroperator state transitions during e2e test run 
 
 Nov 22 21:47:32.876 - 1s    E clusteroperator/openshift-apiserver condition/Available reason/APIServices_Error status/False APIServicesAvailable: rpc error: code = Unknown desc = malformed header: missing HTTP content-type}
 {code}
 While the Kube API server, if that's what's missing the header, is supposed to always be available, an issue that only persists for 1s is not long enough to warrant [immediate admin intervention|https://github.com/openshift/api/blob/c3f7566f6ef636bb7cf9549bf47112844285989e/config/v1/types_cluster_operator.go#L149-L153]. Teaching the openshift-apiserver operator to stay {{Available=True}} for this kind of brief hiccup, while still going {{Available=False}} for issues where [least part of the component is non-functional, and that the condition requires immediate administrator intervention|https://github.com/openshift/api/blob/c3f7566f6ef636bb7cf9549bf47112844285989e/config/v1/types_cluster_operator.go#L149-L153] would make it easier for admins and SREs operating clusters to identify when intervention was required.
 h2. Version-Release number of selected component (if applicable):
 {code:none}
 $ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=48h&type=junit&search=clusteroperator/openshift-apiserver+should+not+change+condition/Available' | grep '^periodic-.*4[.]15.*failures match' | sort
 periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-ibmcloud-ovn-multi-ppc64le (all) - 4 runs, 100% failed, 25% of failures match = 25% impact
 periodic-ci-openshift-multiarch-master-nightly-4.15-ocp-e2e-ibmcloud-ovn-multi-s390x (all) - 4 runs, 25% failed, 200% of failures match = 50% impact
 periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-nightly-4.14-ocp-ovn-remote-libvirt-s390x (all) - 5 runs, 100% failed, 40% of failures match = 40% impact
 periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-upgrade-azure-ovn-arm64 (all) - 5 runs, 40% failed, 50% of failures match = 20% impact
 periodic-ci-openshift-multiarch-master-nightly-4.15-upgrade-from-stable-4.14-ocp-e2e-upgrade-azure-ovn-heterogeneous (all) - 5 runs, 20% failed, 100% of failures match = 20% impact
 periodic-ci-openshift-release-master-ci-4.15-e2e-aws-ovn-upgrade (all) - 5 runs, 20% failed, 200% of failures match = 40% impact
 periodic-ci-openshift-release-master-ci-4.15-e2e-aws-upgrade-ovn-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
 periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade (all) - 50 runs, 56% failed, 21% of failures match = 12% impact
 periodic-ci-openshift-release-master-ci-4.15-e2e-gcp-ovn-upgrade (all) - 80 runs, 44% failed, 17% of failures match = 8% impact
 periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-aws-ovn-upgrade (all) - 80 runs, 30% failed, 13% of failures match = 4% impact
 periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-azure-sdn-upgrade (all) - 80 runs, 43% failed, 6% of failures match = 3% impact
 periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-e2e-gcp-ovn-rt-upgrade (all) - 50 runs, 16% failed, 63% of failures match = 10% impact
 periodic-ci-openshift-release-master-ci-4.15-upgrade-from-stable-4.14-from-stable-4.13-e2e-aws-sdn-upgrade (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
 periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-ovn-single-node-serial (all) - 5 runs, 100% failed, 100% of failures match = 100% impact
 periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-ovn-upgrade-rollback-oldest-supported (all) - 5 runs, 40% failed, 50% of failures match = 20% impact
 periodic-ci-openshift-release-master-nightly-4.15-e2e-aws-sdn-upgrade (all) - 50 runs, 18% failed, 11% of failures match = 2% impact
 periodic-ci-openshift-release-master-nightly-4.15-e2e-gcp-ovn-etcd-scaling (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
 periodic-ci-openshift-release-master-nightly-4.15-e2e-ibmcloud-csi (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
 periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-techpreview (all) - 5 runs, 40% failed, 50% of failures match = 20% impact
 periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-aws-upgrade-ovn-single-node (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
 periodic-ci-openshift-release-master-nightly-4.15-upgrade-from-stable-4.14-e2e-metal-ipi-sdn-bm-upgrade (all) - 5 runs, 100% failed, 20% of failures match = 20% impact
 periodic-ci-openshift-release-master-okd-scos-4.15-e2e-aws-ovn-upgrade (all) - 15 runs, 47% failed, 14% of failures match = 7% impact
 {code}
 
 The impact rates are low enough that I haven't checked older 4.y.  And it's possible that some of those matches have the operator going {{Available=False}} for other reasons besides {{APIServices_Error}}:
 
 {code:none}
 $ curl -s 'https://search.ci.openshift.org/search?maxAge=48h&type=junit&name=4.15.*upgrade&context=0&search=clusteroperator/openshift-apiserver.*condition/Available.*status/False' | jq -r 'to_entries[].value | to_entries[].value[].context[]' | sed 's|.*clusteroperator/\([^ ]*\) condition/Available reason/\([^ ]*\) status/False.*|\1 \2|' | sort | uniq -c | sort -n
       2 openshift-apiserver APIServerDeployment_NoPod
       2 openshift-apiserver APIServerDeployment_PreconditionNotFulfilled
      19 openshift-apiserver APIServices_Error
      22 openshift-apiserver APIServerDeployment_NoDeployment
 {code}
 
 h2. How reproducible:
 
 {{12% impact}} for {{periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade}} looks like the highest impact among the jobs with double-digit run counts.
 
 h2. Steps to Reproduce:
 
 Run {{periodic-ci-openshift-release-master-ci-4.15-e2e-azure-ovn-upgrade}} a bunch of times watching the {{openshift-apiserver}} ClusterOperator's {{Available}} condition.
 
 h2. Actual results:
 
 Some very brief blips of {{Available=False}} that self-resolve before an admin could possibly resolve to the summons.
 
 h2. Expected results:
 
 No quickly-resolving blips in CI.  No long runs of {{Available=False}} for issues that don't seem worth summoning an admin.  Still going {{Available=False}} for outages that need immediate admin response.
Status: POST
{noformat}
: [Monitor:legacy-cvo-invariants][bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available expand_less2h4m41s{  2 unexpected clusteroperator state transitions during e2e test run.  These did not match any known exceptions, so they cause this test-case to fail:
periodic-ci-openshift-release-master-ci-4.17-upgrade-from-stable-4.16-e2e-azure-ovn-upgrade (all) - 67 runs, 40% failed, 30% of failures match = 12% impact
#1997618083186872320junit2 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.
#1995769841897705472junit7 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.
#1994796675167686656junit9 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.
#1994627789013127168junit10 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.
#1994426530960248832junit10 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.
#1993989238193917952junit12 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.
#1993611612908425216junit13 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.
#1993340198766776320junit13 days ago
# [bz-openshift-apiserver] clusteroperator/openshift-apiserver should not change condition/Available
0 unexpected clusteroperator state transitions during e2e test run, as desired.

Found in 11.94% of runs (29.63% of failures) across 67 total runs and 1 jobs (40.30% failed) in 1.687s - clear search | chart view - source code located on github