Running Ransomware Drills on the Factory Floor

Tabletop exercises for OT incidents are valuable. They surface decision points, clarify roles, and let leadership practice the hardest call — when to isolate. But they don't teach the operators on the floor what to actually do when an HMI starts behaving strangely at 2:30 AM.

For that, we run live drills. Not pentests. Not full-scale incident exercises with the board watching. Targeted, low-risk, plant-floor drills that build operator muscle memory. Here is how we structure them.

Why live drills

In a real ransomware event, the first 90 minutes matter enormously. In those 90 minutes:

Operators may notice unusual HMI behavior
Engineering workstations may start behaving strangely
Somebody will have to decide whether to pull network cables, run production on local control, or shut down cleanly

Who does that someone is, in practice, usually a shift supervisor or plant maintenance lead — not a security team member. If that supervisor has never seen "unusual HMI behavior that is not a normal control glitch," they will hesitate. Hesitation is how 90 minutes turns into 17 hours of dwell time.

How we structure a drill

Phase 1: Pre-brief (30 minutes)

Plant leadership, security, and the drill participants meet. We explain that the drill will simulate suspicious behavior. No systems will be harmed. The drill is not a test of the participants — it is a test of the procedures. We emphasize that anyone can call the drill off at any time for operational reasons.

Phase 2: Inject (2–5 minutes)

We execute a pre-agreed, minimally disruptive simulation. Examples we have used:

A simulated "ransomware note" text file appearing on an engineering workstation (not encrypted — just a file)
A brief network-visible anomaly on the plant network — a port scan from a known-safe simulation host
An HMI display that briefly shows an unexpected message (using vendor-approved test features)

Nothing we inject would affect production. Every injection is pre-approved by plant operations and the control system vendor if applicable.

Phase 3: Observe (30–90 minutes)

We watch. We do not coach. We document who notices, who escalates, who calls whom, and how long each step takes. We specifically watch for:

Does the operator recognize the behavior as suspicious?
Does the operator know who to call?
Does the escalation reach a decision-maker?
Does the decision-maker know they have the authority to isolate?

Phase 4: Debrief (60–90 minutes)

We walk through the timeline with everyone who participated. We identify what worked, what didn't, and what procedures need to change. The output is a list of specific improvements to playbooks, training, or architecture.

What we typically learn

Across roughly two dozen live drills in the last two years:

Operators almost always notice something. They are trained observers.
Escalation paths are often unclear. Who to call at 2:30 AM is frequently ambiguous.
Decision authority is often uncertain. Shift supervisors frequently do not know whether they have authority to isolate.
Playbooks usually exist but are not immediately findable. The PDF is on a corporate SharePoint that the plant network cannot reach.

The cost of not drilling

In a real event, you will discover all of the above, but at much higher stakes. The debrief after a real ransomware event is a much harder conversation than the debrief after a drill.

If you are interested in running a live drill at your facility, let's talk. A typical drill takes a half-day on-site and produces more actionable findings than most tabletop exercises.

About the author

This article was written by the Cascadia OT Security practice, which advises Pacific Northwest data centers and manufacturers on industrial cybersecurity. For engagement inquiries, reach our practice team.