Continuous Sync State in Data Plane Cluster After Updating License Key
Identifying and Resolving Continuous Sync Issues Caused by Descheduler Interference After License Update
After updating the license key in the development environment, users may observe that the registered data plane cluster continuously remains in a syncing state within the Akuity Platform. This can block planned license updates to the production environment.
Observed Behavior:
-
The data plane cluster repeatedly enters a syncing or reconnecting state.
-
Running
kubectlcommands intermittently fails with DNS-related errors such as:E0708 15:49:05.749784 6663 memcache.go:265] "Unhandled Error" err="couldn't get current server API group list: Get \"https://zuscu-d006-b207-aks-d3-gant1-3fdnz9dy...privatelink.centralus.azmk8s.io:443/api?timeout=32s\": dial tcp: lookup ... no such host" -
Agent pods show similar symptoms — disconnecting or failing health checks during the same period.
Root Cause:
The error is primarily infrastructure-related (DNS/networking), not directly an issue with the Akuity Agent itself.
However, further investigation revealed that the Kubernetes Descheduler — specifically the RemoveDuplicates policy — can interfere with the agent-server pods, causing them to be rescheduled frequently and preventing stable connections to the Akuity control plane.
Reproduction Steps:
-
Descheduler is enabled with default policies.
-
The cluster has multiple agent-server pods running.
-
Descheduler removes duplicate pods (even if they are healthy), triggering frequent pod rescheduling.
-
This leads to intermittent DNS failures and reconnections between the data plane and control plane, causing the cluster to remain in sync state.
Resolution Steps:
-
Verify Descheduler Installation
kubectl get pods -n kube-system | grep deschedulerConfirm that Descheduler is running.
-
Check for Descheduler Policy ConfigMap
kubectl get cm -n kube-systemIf you get
_Error from server (NotFound): configmaps "descheduler-policy-configmap" not found_,
it might be deployed under a different name depending on your installation method. -
Disable the
RemoveDuplicatesPolicy
Edit the Descheduler ConfigMap (update the name accordingly):kubectl edit cm <descheduler-configmap-name> -n kube-systemComment out or remove:
- name: "RemoveDuplicates"Then restart the Descheduler pod:
kubectl delete pod -n kube-system -l app=descheduler -
Co-locate Agent Pods (Optional Validation)
Schedule bothagent-serverpods on the same node to validate that the issue resolves when Descheduler interference is removed:kubectl get pods -n akuity -o wideEnsure they are on the same node temporarily for testing.
-
Confirm Stability
Once the Descheduler policy is updated:-
The data plane cluster should stop resyncing continuously.
-
DNS errors should disappear.
-
Cluster status in Akuity Cloud should change from “syncing” to healthy (💚).
-
Additional Notes:
-
This issue is not license-key related, though it became visible immediately after the license update.
-
The RemoveDuplicates policy is enabled by default in many Descheduler deployments and can affect any StatefulSet/Deployment using multiple replicas for redundancy.
-
Future mitigation: consider setting PodAntiAffinity or Descheduler exclusion labels for critical workloads like the Akuity agent.