Home
Product
Assessments Screening Links Team Drills CLI Chaos Mode
Solutions by Stack
AI Infrastructure Cloud & Platforms Kubernetes Data Centers Linux & Bare Metal
Solutions by Role
Site Reliability Engineers Platform Engineers DevOps Engineers DC Technicians Linux Admins
Resources
Blog Status Privacy Login Sign up
Drills Internal training

Stop guessing who's ready for on-call

Your team reads runbooks. But can they execute under pressure? Team Drills takes the same incident engine used for hiring and turns it into a training system for your existing engineers.

Onboarding · Upskilling · Readiness Checks · K8s Adoption
See Chaos Mode
drill session - node pressure
Training
root@node-17:~$ kubectl get nodes
NAME      STATUS                     ROLES    AGE
node-17   Ready,SchedulingDisabled   worker   47d
node-18   Ready                      worker   47d
node-19   Ready                      worker   47d

root@node-17:~$ kubectl describe node node-17 | grep Pressure
  MemoryPressure   True
  DiskPressure     False
  PIDPressure      False

root@node-17:~$ kubectl top pods -A --sort-by=memory | head -5
NAMESPACE   NAME                    CPU    MEMORY
monitoring  prometheus-0            120m   4.2Gi
edge        api-gateway-7d4f8       45m    1.1Gi
logging     fluentd-8k2x9           30m    890Mi

Goal: diagnose, cordon, drain, remediate, verify
Training: competence becomes specific and measurable
Same engine Identical to hiring assessments
Repeatable Track progress over time
Evidence-based Session replay for managers
Solo or team Individual drills + Chaos Mode

The difference between "I read the runbook" and "I've done this before"

Most teams discover gaps during real incidents. Team Drills surfaces those gaps in a controlled environment, before the pager goes off.

Traditional training

  • Slide decks and documentation reviews
  • "Shadow an on-call for a month" and hope they learn
  • No way to measure readiness objectively
  • Gaps only discovered during real outages

With Team Drills

  • Live incidents on real systems, not theory
  • Structured drill programs with tracked results
  • Session replay: see exactly how they debug
  • Readiness becomes measurable, not assumed

Run a drill in three steps

Same incident engine as hiring. Different purpose: build capability instead of filtering candidates.

01

Pick the learning target

Choose from existing scenarios (Kubernetes node pressure, GPU diagnostics, Azure networking, Docker security) or request custom ones that mirror your production stack.

02

Run a live drill

Each engineer enters a real environment and works the incident. No mock data, no multiple choice. They investigate, remediate, and verify on a live system that behaves like production.

03

Review the evidence

Managers and leads coach from actual session data: command history, time to root cause, verification steps, hints used. Evidence replaces post-hoc storytelling.

What your team can drill on

Every scenario available for hiring is available for internal training. Pick the technology or skill gap you want to close.

KUBERNETES Cluster operations Pod failures, node pressure, network policies, etcd recovery, cascading drain storms. From L2 basics to L4 control-plane work.
LINUX System fundamentals Disk full, runaway processes, service recovery, performance tuning, kernel module debugging. The foundation everything else sits on.
GPU / AI INFRA Accelerator operations Xid errors, driver conflicts, PCIe failures, DCGM diagnostics. Critical for teams running ML training or inference at scale.
CLOUD Azure & networking Load balancer misconfigs, NSG rule tracing, health probe debugging, bind address issues. Multi-layer cloud networking scenarios.
DOCKER Container security Readonly filesystems, container escapes, privilege escalation, network isolation. For teams building container-native platforms.
INCIDENT POSTURE Response methodology Beyond technical skill. How engineers approach debugging. Systematic investigation, verification discipline, knowing when to escalate.

Hire with assessments. Train with drills. Evolve into Chaos Mode.

The best results come from using all three. Assessments filter candidates. Drills build internal capability. Chaos Mode tests collaboration. Same scenarios, same engine, three different motions.

  • Start with solo drills for baseline validation
  • Graduate to Chaos Mode when team coordination matters
  • Use CLI for engineers who prefer native terminals
  • Track progress across sessions and scenarios
drill results
Results
  PARIUM / drill summary

  ENGINEER   Alex Chen
  SCENARIO   K8s Node Pressure
  RESULT     ● RESOLVED
  TIME       08:42 / 20:00 limit

  ────────────────────────────────
   Root cause identified     03:12
   Remediation applied      06:45
   Health check verified    08:42
  ────────────────────────────────

  Commands: 18   Hints: 0   LLM risk: Low

  Manager note: Clean investigation path.
  Verified before declaring resolved.
  Ready for on-call rotation.

Four programs your team can run today

01

New engineer onboarding

Week one: run them through your core scenarios. Week four: run the same scenarios again. You'll have data on how fast they're ramping instead of vibes from a 1:1.

02

Technology migration readiness

Moving to Kubernetes? Adopting GPU workloads? Run your team through the relevant scenarios before the migration goes live. Find gaps when the cost of a mistake is zero.

03

On-call readiness validation

Before someone goes on the rotation, they should be able to handle the incidents your team actually sees. Drills give you evidence instead of "they seem ready" from a skip-level.

04

Quarterly team exercises

Run your entire SRE team through a drill each quarter. Track improvement. Identify who needs coaching. Build the kind of incident response culture that makes 3am pages less terrifying.

Common questions about Team Drills

Same engine, different purpose. Hiring assessments filter external candidates. Team Drills develop internal engineers. The runtime is identical (real containers, real scenarios, real scoring) but the goal shifts from "should we hire this person?" to "is this person ready for on-call?"

Engineering managers, SRE leads, platform team leads, or enablement owners. Anyone who needs to know whether their team can actually handle production incidents, and wants evidence instead of assumptions.

This is one of the best use cases. Teams learn Kubernetes operations far faster by repeatedly debugging realistic failures (pod crash-loops, node pressure, network policies, etcd issues) instead of reading documentation or watching tutorials.

Start with solo drills for individual baseline validation. When the team is ready, graduate into Chaos Mode for collaborative war room practice. Engineers who prefer native terminals can use the CLI for both. It's one platform with three training motions.

Yes. Each drill session is recorded with full metrics: time to resolution, commands used, hints requested, verification steps. Run the same scenario at different points and compare. The data tells you if training is working.

Get Started

Build stronger teams before production tests them

Team Drills means Parium isn't just a hiring tool. It's how you build a stronger team.

See Chaos Mode
Candidate Assessments Kubernetes Drills AI Infrastructure Drills