The Symptom

At 02:13 UTC, the edge dashboard says requests are flowing.

The backend says requests are not arriving.

No hard errors. No clear drop. Just ghost traffic.

1. Freeze The Narrative

Before changing anything, capture current state:

kubectl get ingress -n edge
kubectl get svc,endpoints -n edge
kubectl get pods -n edge -o wide
kubectl get events -A --sort-by=.metadata.creationTimestamp | tail -40

The goal is to lock evidence before remediation noise starts.

2. Validate Every Hop

Walk the route in order:

  • edge DNS resolution
  • ingress target backend
  • service selector
  • endpoint subsets
  • pod readiness

If one hop is "probably fine," verify anyway.

3. Check Policy Drift

Ghost packet incidents often hide in policy drift, not code deploys.

kubectl describe networkpolicy -n edge
kubectl exec -n edge deploy/api-gateway -- nslookup internal-auth.edge.svc.cluster.local

If DNS egress is partially blocked, you can get intermittent failures that look random.

4. Remediate Narrowly

Use the smallest reversible fix first.

Avoid broad redeploys until you can explain the fault model.

5. Verify From User Path

curl -s http://localhost:8080/health | jq

Then re-check:

  • endpoint population
  • error-rate trend
  • p95 latency trend

A ghost incident is closed only when user path and telemetry both converge.