A Single Point of Failure (SPOF) is any component whose failure would cause the entire system to fail. Common SPOFs include:
- Hardware: Single server, network switch, power supply
- Software: Single database instance, application server
- Network: Single internet connection, DNS server
- Human: Single administrator with exclusive knowledge
Elimination strategies:
- Redundancy: Multiple instances of critical components
- Load Balancing: Distribute traffic across multiple nodes
- Clustering: Group servers to act as one logical unit
- Geographic Distribution: Multiple data centers
- Documentation: Ensure knowledge is shared among team members
Example: Instead of one web server, deploy three servers behind a load balancer with health checks.