Summaries of Papers in Distributed Systems - Applications

A long post with summaries from reading technical papers in distributed systems, focusing on application architecture.

Status: Forever In Progress

Designing and Deploying Internet Scale Services

This paper covers the best practices in design, development, deployment and operation of services built for high-traffic internet services. These best practices are divided into ten sections ranging from application design, infrastructure planning and management, software test, release, monitor, graceful degradation and recovery; there are 3 main tenets that appear throughout the paper - expect failures, keep it simple, automate everything. In the author's experience, the majority of issues arise in the service design and development phase and it is recommended to have the development, testing and operations team work closely together for improved operational-friendliness.

The key points within the section that I find worth highlighting are:

  • Application design - expect failures, avoid SPOF
  • Infrastructure - automate resource provisioning
  • Dependencies - decouple components, small components also make the system more complex
  • Testing - ship often, test in full environment, use production data for testing
  • Hardware - use standard SKUs, code to an abstraction of the hardware, do not try to over-optimize for hardware
  • Capacity - Make development team responsible
  • Monitoring - instrument everything
  • Graceful Degradation - configure a big red switch, have a plan to admit traffic in gradually