Start Wide, Then Go Deep
When ingress goes 503, teams usually lose time by diving straight into logs.
A faster sequence is:
- Validate
ingress -> service -> endpointsfirst. - Check deployment readiness shape next.
- Confirm pod-level DNS path after routing is clean.
1. Validate The Data Path
Run the shortest high-signal checks first:
kubectl get ingress -n edge
kubectl get svc,endpoints -n edge
kubectl describe ingress edge-gateway -n edge
If endpoints are empty, your ingress controller may be fine while service selection is wrong.
2. Confirm Readiness Drift
If selectors are fixed but endpoints still churn, inspect probe paths and events:
kubectl describe deployment api-gateway -n edge
kubectl describe pod -n edge <pod-name>
kubectl get events -n edge --sort-by=.metadata.creationTimestamp
Look for probe path mismatches like /readyz versus /healthz.
3. Verify DNS Egress From The Pod
Once endpoints appear healthy, test resolution from workload context:
kubectl exec -n edge deploy/api-gateway -- nslookup internal-auth.edge.svc.cluster.local
If this fails, inspect NetworkPolicy egress rules for kube-dns/coredns access.
4. Lock In Verification
A fix is complete only when the user path recovers:
kubectl get endpoints api-gateway -n edge
curl -s http://localhost:8080/health | jq
Publish this workflow as your on-call runbook so engineers can execute it under pressure with no guesswork.