Introduction to Building Blocks for Modern System Design
The bottom-up approach for modern system design
System design problems usually have some similarities, though specific details are often unique. We have extracted these similarities across design problems as the basic building blocks we’ll be covering. One example of a building block is a load-balancing component, which we’ll probably use in every design problem in one way or the other.
The purpose of separating the building blocks is to thoroughly discuss their design just once. This means that later we can use them anywhere without having to go over them in detail again. We can think about building blocks as bricks to construct more effective, capable systems.
Many of the building blocks we discuss are also available for actual use in the public clouds, such as Amazon Web Services (AWS), Azure, and Google Cloud Platform (GCP). We can use such constructs to build a system to further cement our understanding. (We won’t construct the system in this course, but we’ve left it as an exercise for interested learners.)
Using building blocks to devise a bottom-up approach for designing systems
从下往上的系统设计思路 在系统设计问题中,通常会有一些共同点,但具体细节往往各不相同。我们从各种设计问题中抽取出了它们的共同之处,作为本课程将要深入探讨的基础构建模块(building blocks)。例如,负载均衡器组件就是一个常见的构建模块,它几乎在每个系统设计问题中都会被用到。
我们将这些构建模块单独讨论的目的,是希望只需深入设计一次,之后在任何系统设计场景中便可直接将它们拿来使用,无需再次从零开始。这些构建模块就像砖块一样,可以用来构建更高效、更强大的系统。
我们将要讨论的许多构建模块,在实际中也都可以在公有云(如 AWS、Azure、GCP)上直接使用。我们可以通过使用这些云服务来构建系统,从而进一步加深对这些模块的理解。(本课程不会动手构建此类系统,但对有兴趣的学习者来说,这将是一个很好的实践方向。)
运用构建模块,以“自下而上(bottom-up)”的方式进行系统设计示意图
We’ll discuss the following building blocks in detail:
- Domain Name System: This building block focuses on how to design hierarchical and distributed naming systems for computers connected to the Internet via different Internet protocols.
- Load Balancers: Here, we’ll understand the design of a load balancer, which is used to fairly distribute incoming clients’ requests among a pool of available servers. It also reduces load and can bypass failed servers.
- Databases: This building block enables us to store, retrieve, modify, and delete data in connection with different data-processing procedures. Here, we’ll discuss database types, replication, partitioning, and analysis of distributed databases.
- Key-Value Store: It is a non-relational database that stores data in the form of a key-value pair. Here, we’ll explain the design of a key-value store along with important concepts such as achieving scalability, durability, and configurability.
- Content Delivery Network: In this chapter, we’ll design a content delivery network (CDN) that’s used to keep viral content such as videos, images, audio, and webpages. It efficiently delivers content to end users while reducing latency and burden on the data centers.
- Sequencer: In this building block, we’ll focus on the design of a unique IDs generator with a major focus on maintaining causality. It also explains three different methods for generating unique IDs.
- Service Monitoring: Monitoring systems are critical in distributed systems because they help analyze the system and alert the stakeholders if a problem occurs. Monitoring is often useful to get early warning systems so that system administrators can act ahead of an impending problem becoming a huge issue. Here, we’ll build two monitoring systems, one for the server-side and the other for client-side errors.
- Distributed Caching: In this building block, we’ll design a distributed caching system where multiple cache servers coordinate to store frequently accessed data.
- Distributed Messaging Queue: In this building block, we’ll focus on the design of a queue consisting of multiple servers, which is used between interacting entities called producers and consumers. It helps decouple producers and consumers, results in independent scalability, and enhances reliability.
- Publish-Subscribe System: In this building block, we’ll focus on the design of an asynchronous service-to-service communication method called a pub-sub system. It is popular in serverless, microservices architectures and data processing systems.
- Rate Limiter: Here, we’ll design a system that throttles incoming requests for a service based on the predefined limit. It is generally used as a defensive layer for services to avoid their excessive usage—whether intended or unintended.
- Blob Store: This building block focuses on a storage solution for unstructured data—for example, multimedia files and binary executables.
- Distributed Search: A search system takes a query from a user and returns relevant content in a few seconds or less. This building block focuses on the three integral components: crawl, index, and search.
- Distributed Logging: Logging is an I/O intensive operation that is time-consuming and slow. Here, we’ll design a system that allows services in a distributed system to log their events efficiently. The system will be made scalable and reliable.
- Distributed Task Scheduling: We’ll design a distributed task scheduler system that mediates between tasks and resources. It intelligently allocates resources to tasks to meet task-level and system-level goals. It’s often used to offload background processing to be completed asynchronously.
- Sharded Counters: This building block demonstrates an efficient distributed counting system to deal with millions of concurrent read/write requests, such as likes on a celebrity’s tweet.
We have topologically ordered the building blocks so the building blocks that depend on others come later.
我们将深入探讨以下构建模块:
域名系统(Domain Name System): 本模块主要关注如何设计分层且分布式的命名系统,用于通过各种互联网协议将计算机连接到网络。
负载均衡器(Load Balancers): 本模块将探讨负载均衡器的设计,了解其如何在可用服务器的池(pool)中将传入的客户端请求进行合理分配,以减少单点压力并规避失败服务器。
数据库(Databases): 本模块将讲解数据库在存储、检索、修改和删除数据时所涉及的不同设计问题,包括数据库类型、数据复制、分区以及分布式数据库的分析。
键值存储(Key-Value Store): 这是一种非关系型数据库,以键-值对的形式存储数据。本模块会讲解其设计以及在可扩展性、持久性和可配置性上的重要概念。
内容分发网络(Content Delivery Network, CDN): 本模块将设计一个内容分发网络,用于存储高并发访问的内容(如视频、图片、音频、网页等),以提高内容分发效率并降低数据中心的负担。
序列生成器(Sequencer): 本模块将着重讨论如何设计分布式唯一 ID 生成服务,并重点保证因果顺序(causality)。还会介绍三种不同的唯一 ID 生成方法。
服务监控(Service Monitoring): 在分布式系统中,监控系统非常关键,它能帮助我们分析系统状况,并在出现问题时向相关人员发出告警。这里我们将构建两种监控系统:一种用于服务器端,另一种用于客户端错误监控。
分布式缓存(Distributed Caching): 本模块设计一个分布式缓存系统,让多台缓存服务器协同工作来存储经常访问的数据。
分布式消息队列(Distributed Messaging Queue): 本模块会探讨一个由多台服务器组成的队列系统,用于在生产者(producer)与消费者(consumer)之间进行解耦,以实现独立扩展并增强系统可靠性。
发布-订阅系统(Publish-Subscribe System): 本模块聚焦异步服务间通信方式——发布-订阅(pub-sub)系统的设计。此系统在无服务器(serverless)、微服务和数据处理系统中非常常见。
限流器(Rate Limiter): 本模块会设计一个根据预先定义的限制来“节流”传入请求的系统。这通常用来作为服务的防护层,防止过量的请求访问(不管是有意还是无意的过度使用)。
Blob 存储(Blob Store): 本模块聚焦在非结构化数据(如多媒体文件、二进制可执行文件等)的存储方案上。
分布式搜索(Distributed Search): 搜索系统从用户获取查询,并在几秒甚至更短时间内返回相关内容。本模块将探讨搜索系统的三个核心部分:抓取(crawl)、索引(index)和搜索(search)。
分布式日志(Distributed Logging): 日志操作在 I/O 密集型系统中耗时较长。本模块会设计一个系统,让分布式系统中的各个服务能高效地记录日志。我们会重点保证系统的可扩展性与可靠性。
分布式任务调度(Distributed Task Scheduling): 本模块设计一个在任务与资源之间进行调度的分布式任务调度系统。它能智能地为任务分配资源,实现任务级和系统级目标,通常用于将需要后台处理的任务异步地执行。
分片计数器(Sharded Counters): 本模块演示如何在分布式环境中高效地进行计数操作,以应对数百万并发读写请求(例如,某位名人的推文收到的点赞数)。
我们对构建模块进行了“拓扑排序”,因此依赖其他模块的内容会放在后面。
Conventions
For elaboration, we’ll use a “Requirements” section whenever we design a building block (and a design problem). The “Requirements” section will highlight the deliverables we expect from the developed design. “Requirements” will have two sub-categories:
- Functional requirements: These represent the features a user of the designed system will be able to use. For example, the system will allow a user to search for content using the search bar.
- Non-functional requirements (NFRs): The non-functional requirements are criteria based on which the user of a system will consider the system usable. NFR may include requirements like high availability, low latency, scalability, and so on.
Let’s start with our building blocks.
在设计每个构建模块(以及其他设计问题)时,我们会使用一个“需求(Requirements)”部分来展开说明,突出设计需要实现的功能和目标。该部分包括:
- 功能性需求(Functional requirements) 指系统面向用户需要提供的功能。例如,“系统允许用户通过搜索框搜索内容”。
- 非功能性需求(Non-functional requirements, NFRs) 指用户判断系统是否可用或好用的一些标准,例如高可用性(HA)、低延迟(latency)、可扩展性(scalability)等等。
让我们从这些构建模块开始学习吧!