This course focuses on designing networks that continue to operate under failure, load, and change. Participants explore redundancy models, fault domains, active-active architectures, and the difference between theoretical resilience and real-world availability. The course examines how failures propagate across data centre, transport, and access networks, and how poor design decisions create hidden single points of failure. Emphasis is placed on aligning technical architecture with operational processes, monitoring, and recovery strategies. By the end of the course, delegates will be able to critically assess network designs, identify risk, and design architectures that support predictable uptime and service continuity.

Pre-requisites: Working knowledge of IP networking; experience operating or designing production systems is recommended.