Major network upgrades—replacing aging switches, restructuring for segmentation, migrating to new firewall architectures—have historically required production downtime measured in days. Modern manufacturing cannot accept that risk. The challenge is executing complex network changes in parallel with running production, validating the new infrastructure thoroughly before cutting over, and having a rapid rollback plan if anything goes wrong.
The key principle: build the new network alongside the old one, run both in parallel until you are confident the new one is stable, then migrate in the smallest possible increments. Each increment should be reversible without affecting production.
Parallel Infrastructure Phase
Deploy the new network infrastructure—new switches, firewalls, cabling, IP addressing scheme—without touching the existing network. Commission the new infrastructure separately with isolated test traffic. This allows you to validate all the configuration details, failover scenarios, and performance characteristics before any production systems depend on it.
Use this validation phase to catch configuration errors, identify protocol incompatibilities, and train your operations team on the new infrastructure. If something is broken, you discover it while production is running on the old network and you have time to fix it.
Parallel Operation and Verification
- Dual-Path Testing: Once the new infrastructure is commissioned, begin routing production traffic through both old and new networks in parallel. Traffic flows simultaneously to both paths. You capture and compare output to ensure both paths produce identical results. Any divergence indicates a configuration problem in the new network.
- Failover Validation: Deliberately fail components of the old network to verify the new network can sustain all production traffic without degradation. Fail a switch, disconnect an uplink, or overload a firewall. Production should continue unaffected on the new network.
- Performance and Timing Validation: For control systems with hard real-time requirements, validate that latency, jitter, and packet loss on the new network meet production specifications. Use actual production traffic and actual control algorithms to measure performance, not synthetic tests.
- Third-Party Integration Testing: If production depends on external integrations (MES, historian, remote support), test all integrations against the new network before full cutover. Remote access systems, data transfers, and third-party vendor connections should all work identically on the new network.
Staged Migration and Rollback
Begin cutover with the lowest-impact production zone or process. Route that zone's traffic exclusively to the new network. Let it run through at least two complete production cycles (48 hours for many facilities) without incident. If problems arise, fall back to the old network immediately—a single device or process is much easier to debug than an entire facility.
Once the first zone is stable, migrate the next lowest-impact zone. Continue zone by zone until all production is on the new network. If you reach a point where migration is too risky, stop and maintain parallel operation indefinitely. The cost of running parallel infrastructure is usually lower than the risk of an incomplete cutover.
Decommissioning the old network is the final step, only after the new network has been stable for at least a month and all rollback paths have been closed. Keep old equipment for at least six months as spare parts in case failures occur during the stability period.
If you're planning major network infrastructure changes, reach out to discuss migration strategy for your facility.
This article was written by the Cascadia OT Security practice, which advises Pacific Northwest data centers and manufacturers on industrial cybersecurity. For engagement inquiries, reach our practice team.