Resilience Products
Hand-picked resilience tools and resources we actually use in production. Each item tested by developers, for developers.

Designing for Scalability with Erlang/OTP: Implement Robust, Fault-Tolerant Systems
Césarini and Vinoski's guide to building fault-tolerant systems with Erlang/OTP. Even if you never write Erlang, the mental models for fault-tolerance and concurrency are outstanding.
Changed how I think about fault tolerance. Even if you never write Erlang, the mental models for supervision trees and process isolation apply to any resilient system. Read full review.

Release It! Design and Deploy Production-Ready Software (2nd Edition)
Michael Nygard's definitive guide for production hardening, resilience, and real-world failures. Learn how to design systems that survive in production, not just work in development.
The book I read after my first production incident. Nygard's stability patterns and capacity planning framework helped me understand why code that works in dev fails in production. Read full review.