Home
Product
Assessments Screening Links Team Drills CLI Chaos Mode
Solutions by Stack
AI Infrastructure Cloud & Platforms Kubernetes Data Centers Linux & Bare Metal
Solutions by Role
Site Reliability Engineers Platform Engineers DevOps Engineers DC Technicians Linux Admins
Resources
Blog Status Privacy Login Sign up
Data Centre Operations

Assess the people who keep the lights on.

Data centre roles need a different kind of assessment. Your technicians work with storage controllers, IPMI, BMC interfaces, RAID arrays, and operational runbooks. They need to follow procedures safely, understand escalation points, and make the right call when a drive fails or a power event occurs. Parium has scenarios built specifically for this environment.

Try a DC Scenario

The tools and procedures your DC team uses

Data centre operations is as much about process discipline as technical skill. Our scenarios test both: can the candidate use the right tools and follow the right procedures under pressure?

ipmitool iDRAC / iLO BMC management smartctl MegaCli / storcli mdadm RAID rebuild lshw / dmidecode dmesg freeipmi sensor monitoring SEL logs power management PDU / UPS thermal monitoring operational runbooks escalation procedures change management cable verification firmware updates

Data centre scenarios available today

Each scenario includes an operational runbook or telemetry view. Junior candidates get guided procedures. Senior candidates get the incident brief and drive the investigation independently.

Read-Only Storage Controller

A storage system has gone read-only. Applications can read data but writes are failing. The candidate investigates the storage controller state, RAID health, drive SMART data, and filesystem mount options to identify the cause and restore write access without data loss.

15-25 min·Senior / Mid / Junior levels

RAID Degradation and Drive Replacement

A RAID array is running in degraded mode. One drive has failed and another is showing predictive failure indicators. The candidate identifies the failed and failing drives, assesses rebuild risk, follows the replacement procedure, and initiates rebuild while the system is live.

20-30 min·Senior / Mid levels

Thermal Event Response

Environmental sensors have triggered alerts. Inlet temperatures are rising across a row of racks. The candidate reviews IPMI sensor data, identifies the affected zone, checks CRAC unit status, and determines whether to start migrating workloads or wait for facilities to respond.

15-20 min·Mid / Junior levels

Power Event: PDU Failure

A PDU has lost a phase. Half the servers in a rack have lost redundant power. UPS is holding but runtime is limited. The candidate must assess the impact, identify which servers need immediate attention (single-corded vs dual-corded), and execute the emergency power procedure safely.

15-25 min·Senior / Mid levels

BMC/IPMI Remote Recovery

A remote server is unresponsive. SSH is down but the BMC is reachable. The candidate uses IPMI tools to check power state, review SEL logs, inspect sensor readings, and determine whether a graceful reboot, a hard reset, or a console session is needed. Tests remote hands decision-making.

10-15 min·Mid / Junior levels

GPU Xid 79 with Escalation Decision

A GPU in a DC rack has reported Xid 79 errors. The candidate investigates using the DC runbook, checks dmesg, nvidia-smi, and PCIe state, and must decide whether to attempt a driver reset, drain the node, or escalate for hardware RMA. Tests the boundary between operational fix and hardware escalation.

20-30 min·Senior / Mid / Junior levels

Custom data centre scenarios

Every data centre has its own procedures, ticketing systems, escalation paths, and hardware configurations. If your team uses specific runbooks, specific hardware vendors (Dell, HPE, Supermicro, Lenovo), or specific monitoring tools (Nagios, Zabbix, Prometheus, Datadog), we can build scenarios that match your operational environment.

Explore more