Tabletop exercises for OT incidents are valuable. They surface decision points, clarify roles, and let leadership practice the hardest call — when to isolate. But they don't teach the operators on the floor what to actually do when an HMI starts behaving strangely at 2:30 AM.
For that, we run live drills. Not pentests. Not full-scale incident exercises with the board watching. Targeted, low-risk, plant-floor drills that build operator muscle memory. Here is how we structure them.
Why live drills
In a real ransomware event, the first 90 minutes matter enormously. In those 90 minutes:
- Operators may notice unusual HMI behavior
- Engineering workstations may start behaving strangely
- Somebody will have to decide whether to pull network cables, run production on local control, or shut down cleanly
Who does that someone is, in practice, usually a shift supervisor or plant maintenance lead — not a security team member. If that supervisor has never seen "unusual HMI behavior that is not a normal control glitch," they will hesitate. Hesitation is how 90 minutes turns into 17 hours of dwell time.
How we structure a drill
Phase 1: Pre-brief (30 minutes)
Plant leadership, security, and the drill participants meet. We explain that the drill will simulate suspicious behavior. No systems will be harmed. The drill is not a test of the participants — it is a test of the procedures. We emphasize that anyone can call the drill off at any time for operational reasons.
Phase 2: Inject (2–5 minutes)
We execute a pre-agreed, minimally disruptive simulation. Examples we have used:
- A simulated "ransomware note" text file appearing on an engineering workstation (not encrypted — just a file)
- A brief network-visible anomaly on the plant network — a port scan from a known-safe simulation host
- An HMI display that briefly shows an unexpected message (using vendor-approved test features)
Nothing we inject would affect production. Every injection is pre-approved by plant operations and the control system vendor if applicable.
Phase 3: Observe (30–90 minutes)
We watch. We do not coach. We document who notices, who escalates, who calls whom, and how long each step takes. We specifically watch for:
- Does the operator recognize the behavior as suspicious?
- Does the operator know who to call?
- Does the escalation reach a decision-maker?
- Does the decision-maker know they have the authority to isolate?
Phase 4: Debrief (60–90 minutes)
We walk through the timeline with everyone who participated. We identify what worked, what didn't, and what procedures need to change. The output is a list of specific improvements to playbooks, training, or architecture.
What we typically learn
Across roughly two dozen live drills in the last two years:
- Operators almost always notice something. They are trained observers.
- Escalation paths are often unclear. Who to call at 2:30 AM is frequently ambiguous.
- Decision authority is often uncertain. Shift supervisors frequently do not know whether they have authority to isolate.
- Playbooks usually exist but are not immediately findable. The PDF is on a corporate SharePoint that the plant network cannot reach.
The cost of not drilling
In a real event, you will discover all of the above, but at much higher stakes. The debrief after a real ransomware event is a much harder conversation than the debrief after a drill.
If you are interested in running a live drill at your facility, let's talk. A typical drill takes a half-day on-site and produces more actionable findings than most tabletop exercises.
This article was written by the Cascadia OT Security practice, which advises Pacific Northwest data centers and manufacturers on industrial cybersecurity. For engagement inquiries, reach our practice team.