Site Reliability Engineers
Test incident response, system debugging, and production troubleshooting skills with real-world scenarios.
See how candidates handle outages before something actually breaks.
Respect your candidates' time - and your engineers' too.
Use our ready-made scenarios or let us build custom assessments for your stack.
Pick from our ready-made scenarios (GPU debugging, server performance, Kubernetes) or tell us your stack and we'll build custom assessments.
Share a link. Candidates enter their details and drop straight into a live terminal. No downloads, no accounts, no friction.
See exactly how they debug: time to resolution, commands used, thought process. Make confident hiring decisions backed by data.
Everything you need to assess real engineering skills
Full Linux VMs via the browser, or connect through our CLI for chaos room sessions. Not a sandbox — a real system to debug.
Automatic timing from first command to incident resolution. Compare candidates objectively against your team's benchmarks.
Real SOPs like your team uses. Track whether candidates follow procedures independently or need guidance — and how much.
Paste event tracking and behavioural pattern analysis to flag candidates using AI assistance during the assessment.
Every keystroke, every command, every pause — with timestamps. Replay the entire session or export the full log for review.
Azure networking, K8s cascading failures, GPU driver conflicts, and more. Match the scenario to the role you're hiring for.
Each scenario is a carefully crafted incident with production-accurate logs, configs, and system state. Real kernel modules, real error messages, real tools — dmesg journalctl kubectl nvidia-smi — with health check endpoints that validate the fix.
═══════════════════════════════════════════════ INCIDENT ALERT — SEV 1 ═══════════════════════════════════════════════ INCIDENT ID: INC-2026-0315-LB503 SEVERITY: Critical — Production AFFECTED: edge-api.parium.internal IMPACT: $8,200/hr revenue at risk ─────────────────────────────────────────────── Production Edge API is returning HTTP 503 through the Azure Load Balancer. The VM appears to be running, but zero backend health probes succeed. Active escalations: 3 customer tickets Executive visibility: Yes — CTO notified ═══════════════════════════════════════════════ YOUR TASK ═══════════════════════════════════════════════ 1. Investigate why the LB returns 503 2. Identify all root causes (there may be more than one) 3. Apply fixes using approved remediation tools 4. Verify health check returns 200 OK
═══════════════════════════════════════════════ INCIDENT ALERT — SEV 1 ═══════════════════════════════════════════════ INCIDENT ID: INC-2026-WAR-ROOM SEVERITY: Critical — Cascading CLUSTER: prod-us-east-1 (18 nodes) IMPACT: $15K/hr → escalating ─────────────────────────────────────────────── api-gateway pods are in CrashLoopBackOff. Customer-facing traffic is failing. SLA budget is burning. This incident has executive visibility. WARNING: This incident will escalate. Each fix you apply may reveal the next failure. Prioritise methodically. SLA budget remaining: 47 minutes Oncall team: Platform Engineering War room: Active — you are IC ═══════════════════════════════════════════════ YOUR TASK ═══════════════════════════════════════════════ 1. Restore api-gateway service availability 2. Investigate and resolve cascading failures 3. Validate cluster health at each phase 4. Maintain SLA budget — time matters
═══════════════════════════════════════════════ INCIDENT ALERT — SEV 2 ═══════════════════════════════════════════════ INCIDENT ID: INC-2026-0119-GPU SEVERITY: High — Production ML AFFECTED: gpu-node-01.neocloud.internal IMPACT: $4,200/hr compute waste ─────────────────────────────────────────────── GPU compute jobs are failing on gpu-node-01. The node has 2x NVIDIA A100 80GB GPUs but only 7 of 8 devices are detected by monitoring. Queued jobs: 3 LLM fine-tuning runs Last healthy: 08:00 UTC today Kernel log: Xid 79 — GPU fallen off bus ═══════════════════════════════════════════════ YOUR TASK ═══════════════════════════════════════════════ 1. Investigate why nvidia-smi shows fewer GPUs 2. Identify the root cause (driver vs hardware) 3. Restore GPU functionality if possible 4. Escalate to hardware team if necessary
Scenarios matched to every role on your team
Test incident response, system debugging, and production troubleshooting skills with real-world scenarios.
Assess configuration management, CI/CD pipelines, container orchestration, and infrastructure automation.
Evaluate hardware diagnostics, bare metal troubleshooting, and GPU/accelerator management skills.
Test core Linux skills, process management, and filesystem troubleshooting abilities.
Our most advanced scenario doesn't end when you fix the first problem. Each resolution triggers the next hidden failure — just like production.
Multiple engineers connect to the same live incident simultaneously. Test how candidates collaborate under pressure — or use it for team training exercises.
The Parium CLI connects you directly to war room sessions from your own terminal. No browser, no context switching — just parium open and you're in.
$ npm install -g @parium.ai/cli@preview
$ parium open █▀█ ▄▀█ █▀█ █ █ █ █▀▄▀█ █▀▀ █▀█ █▀▄ █ █▄█ █ ▀ █ Chaos Terminal Client v0.1.0-alpha.2 Paste handoff token: •••••••••••• ✓ Token validated ✓ Session resolved — k8s-chaos-war-room ⟳ Attaching to terminal... ────────────────────────────────────── SESSION K8s Cascading Failure STATUS ● LIVE PHASE 3 of 6 — DNS network policy IMPACT $120K/hr ────────────────────────────────────── candidate@prod-worker-07:~$ █
No unfamiliar IDEs. No artificial puzzles. Just a terminal and a real incident - the environment they work in every day.
Everything you need to know about how Parium works.
Candidates connect to a real, isolated Linux environment - not a browser simulation or multiple-choice sandbox. Each assessment spins up a fresh system with the incident pre-configured. They get full terminal access with real bash, real logs, and real system tools. It's the same experience as SSH'ing into a production server.
Parium is built for any role that requires hands-on Linux troubleshooting: Site Reliability Engineers (SRE), DevOps Engineers, Platform Engineers, Data Center Technicians, Linux System Administrators, Cloud Engineers, and Infrastructure Engineers. Our scenarios range from L1 support tasks (config errors, disk space) to L4 senior-level incidents (GPU driver conflicts, kernel modules, PCIe issues).
We monitor for patterns that suggest external help - things like leaving the terminal for extended periods, large paste events, and unusual command timing. Suspicious activity gets flagged in the hiring manager report with enough context for you to make an informed judgment. We can't catch everything, but the patterns are usually pretty obvious.
When the candidate clicks "Verify Fix," we run a health check against the scenario's success criteria (e.g., curl the API endpoint, check nvidia-smi output). If it passes, we record their time-to-resolution. The hiring manager gets a full report: every command with timestamps, hints used, suspicious activity flags, and an AI-generated analysis of their troubleshooting approach and methodology.
HackerRank, Codility, and similar platforms test algorithmic coding in sandboxed editors. Parium tests operational skills in real Linux environments. Your SRE candidates don't need to reverse a linked list - they need to figure out why nginx won't start or why the GPU driver isn't loading. We measure how they investigate, not whether they memorised the answer.
Yes. We can build scenarios that mirror your actual production environment - your monitoring tools, your deployment setup, your common failure modes. Whether it's Kubernetes on EKS, GPU clusters with SLURM, or legacy systems with custom daemons, we'll create assessments that test exactly what your team deals with day-to-day. Get in touch to discuss.
Beyond pass/fail, we give you session replay - watch exactly how candidates approached the problem. You'll see every command they ran, when they pasted content (and what they pasted), when they switched tabs, how long they were away, and when they used hints. It's like watching over their shoulder, but asynchronously. You see how they think, not just whether they got the answer.
Every candidate gets the same scenario, the same environment, the same success criteria. No more "it depends on who reviewed it." Structured evaluation that gives every candidate a fair shot.
No variation between candidates. Everyone faces the same incident with the same tools available.
Clear pass/fail based on whether the fix works — not on how well someone writes a README or formats their code.
Time-to-resolution, commands used, hints requested. Compare candidates on the metrics that matter.
Whether you need a custom scenario for your stack, want to discuss enterprise pricing, or just have questions, we'd love to hear from you.
See real incident performance before you hire.