Evaluating a system's design

Engineers often design new system based on their past experience, knowledge and gut. Sharing some questions that I usually answer to myself to ensure that I have addressed common concerns.


High level Design questions:

  1. What is the bottleneck of the system? Is it a hard or soft bottleneck?
  2. What components of the system are not horizontally scalable?
  3. Is there a simpler design that meets our needs? Does the proposed system feel overly complex?
  4. What kind of hardware is the right type for such a system?
  5. How much hardware would be required as a function of transaction rates, data sizes, expected growth rate and any other parameters for which system needs to scale.
  6. What system SLAs can be improved if cost wasn't a factor?
  7. What other approaches are reasonable for our requirements? What is the trade off for recommending this design over others? 


System expansion questions:

  1. Complexity and development cost of adding new algorithms, new data sources, new workflows, new features, or other minor modifications to system.
  2. What if scenarios: For e.g. what if system became a wild success (say 10X or 100X most optimistic estimates)
  3. What other use cases could be on boarded with minor design changes?
  4. How could the system evolve to reduce costs if it becomes driving factor?
  5. What would the various phases of system development look like and what use cases would be delivered in each phase?


Routine Operations questions:

  1. What is the process and cost of doing deployments/code roll outs?
  2. How would the system/ operations team monitor system health?
  3. What kind of tools would be required for effective operations?
  4. What is the cost of recovering system from hardware and software failures? ( for e.g. data center outages, key dependencies going down etc.
  5. Does on boarding customers require ops involvement? Does on boarding "special" customers require ops involvement?
  6. How does operational costs change with number of users, number of machines etc?
  7. How much in advance due to need to order additional capacity? Is the hardware commodity or customized?
  8. Can additional capacity be added one machine at a time? Can extra capacity be removed equally easily?
  9. What is the process for installing new hardware capacity? Does it require notifying/involving other teams? Does it have any visible customer impact? Does adding capacity require any manual involvement?
  10. How does the system behave when overloaded (say unexpected traffic spike)?
  11. How does the system guard against accidental abuse by users (unexpected load, bad data, bad requests etc)?
  12. Can the system be piece meal lifted into cloud if needed?


Thanks

Umesh


To view or add a comment, sign in

More articles by Umesh Kumar

  • My 2021 reading update

    I had a mildly successful 2021 for my reading goals. I managed to complete reading 10 books last year which was 2 short…

    1 Comment
  • Kubernetes in Action

    I just finished reading Kubernetes in Action — my first technical book for this year. (this one https://www.

    2 Comments
  • How to Deliver presentations

    I recently joined a toastmasters group to get some regular speaking practice. I decided to practice regularly because I…

    1 Comment
  • Books to read in 2021

    Inspired by people sharing their 2020 reading list, I would love to hear your book recommendations for my reading in…

    4 Comments
  • Intro /First 1:1s in age of Covid

    When you join a new job, you meet a lot of people for first time during ramp up. The goal of these meetings is both to…

    1 Comment
  • Debugging software - A step by step checklist

    Debugging is a reality for all engineers. I was reading "The Practice of Programming" by Kernighan and Ritchie today…

  • System Design - Aadhar Card for Properties -Part 1

    India has been making bold strides towards a digital economy like Aadhar and cashless economy. So, a conversation made…

  • Improving Kafka Performance - A thought

    Kafka is heavily used at several companies today. So, I sat around wondering how I would approach if I get tasked to…

  • Your java skills on a scale of 1-10

    Being asked to rate your programming skills on a scale of 1-10 is a often asked question in software engineering…

  • A real time messaging system - Evolving design with Requirements

    Continuing in this series of how requirements might evolve for systems, this time I explored how a real time messaging…

Explore content categories