Skip to content
English
  • There are no suggestions because the search field is empty.

Continuous Sync State in Data Plane Cluster After Updating License Key

Identifying and Resolving Continuous Sync Issues Caused by Descheduler Interference After License Update

After updating the license key in the development environment, users may observe that the registered data plane cluster continuously remains in a syncing state within the Akuity Platform. This can block planned license updates to the production environment.

Observed Behavior:

  • The data plane cluster repeatedly enters a syncing or reconnecting state.

  • Running kubectl commands intermittently fails with DNS-related errors such as:

     
    E0708 15:49:05.749784 6663 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://zuscu-d006-b207-aks-d3-gant1-3fdnz9dy...privatelink.centralus.azmk8s.io:443/api?timeout=32s\": dial tcp: lookup ... no such host"
  • Agent pods show similar symptoms — disconnecting or failing health checks during the same period.


Root Cause:

The error is primarily infrastructure-related (DNS/networking), not directly an issue with the Akuity Agent itself.
However, further investigation revealed that the Kubernetes Descheduler — specifically the RemoveDuplicates policy — can interfere with the agent-server pods, causing them to be rescheduled frequently and preventing stable connections to the Akuity control plane.


Reproduction Steps:

  1. Descheduler is enabled with default policies.

  2. The cluster has multiple agent-server pods running.

  3. Descheduler removes duplicate pods (even if they are healthy), triggering frequent pod rescheduling.

  4. This leads to intermittent DNS failures and reconnections between the data plane and control plane, causing the cluster to remain in sync state.


Resolution Steps:

  1. Verify Descheduler Installation

    kubectl get pods -n kube-system | grep descheduler

     

    Confirm that Descheduler is running.

  2. Check for Descheduler Policy ConfigMap 

    kubectl get cm -n kube-system

    If you get
    _Error from server (NotFound): configmaps "descheduler-policy-configmap" not found_,
    it might be deployed under a different name depending on your installation method.

  3. Disable the RemoveDuplicates Policy
    Edit the Descheduler ConfigMap (update the name accordingly): 

    kubectl edit cm <descheduler-configmap-name> -n kube-system

    Comment out or remove: 

    - name: "RemoveDuplicates"

     

    Then restart the Descheduler pod:

    kubectl delete pod -n kube-system -l app=descheduler
  4. Co-locate Agent Pods (Optional Validation)
    Schedule both agent-server pods on the same node to validate that the issue resolves when Descheduler interference is removed: 

    kubectl get pods -n akuity -o wide

     

    Ensure they are on the same node temporarily for testing.

  5. Confirm Stability
    Once the Descheduler policy is updated:

    • The data plane cluster should stop resyncing continuously.

    • DNS errors should disappear.

    • Cluster status in Akuity Cloud should change from “syncing” to healthy (💚).


Additional Notes:

  • This issue is not license-key related, though it became visible immediately after the license update.

  • The RemoveDuplicates policy is enabled by default in many Descheduler deployments and can affect any StatefulSet/Deployment using multiple replicas for redundancy.

  • Future mitigation: consider setting PodAntiAffinity or Descheduler exclusion labels for critical workloads like the Akuity agent.


References: