Put Back-of-the-envelope Numbers in Perspective

Back-of-the-envelope calculations (BOTECs) involve swift, approximate, and simplified estimations or computations typically done on paper or, figuratively, on the back of an envelope. While these calculations are not intended to yield precise results, they function as a quick and preliminary evaluation of crucial parameters and the feasibility of a system.

For example, let’s say we’re in a city and want to estimate the population of a particular neighborhood. We could count the number of houses in a sample area, estimate the average number of people per household, and then extrapolate to the whole neighborhood. Similar calculations can be used to check the validity of census data for some neighborhoods.

“背包计算”（Back-of-the-envelope calculations，简称 BOTECs）是指在纸上，或字面意义上在信封背面进行的快速、近似且简化的估算或计算。虽然这些计算并非旨在得到非常精确的结果，但它们可以对关键参数和系统的可行性提供一个快速且初步的评估。

举个例子，如果我们身处某座城市，想要估算某个社区的人口，我们可以在一个示范区域里数出房屋的数量，估算每户的平均人口，然后把这个结果推算到整个社区。类似的计算也可以用来检验一些社区的人口普查数据是否可信。

BOTECs in system design

A modern system is a complex web of computational resources connected via a network. Different kinds of nodes, such as load balancers, web servers, application servers, caches, in-memory databases, and storage nodes, collectively serve the clients. Such a system might be architected in different ways, including a monolithic architecture, a modular monolith architecture, or a microservices architecture. Precisely considering such richness at the design level (especially in an interview) isn’t advisable, and sometimes, it’s impossible. BOTECs help us ignore the nitty-gritty details of the system (at least at the design level) and focus on more important aspects, such as finding the feasibility of the service in terms of computational resources.

Some examples where we often need BOTECs are the following estimations:

The number of concurrent TCP connections a server can support
The number of requests per second (RPS) a web, database, or cache server can handle
The storage requirements of a service

Using BOTECs, we abstract away the messy details specific to different kinds of servers used in the actual system, the different access latencies of system components, different throughput rates, and the different types of requests. As we move forward, we’ll first look into these different server types, access latencies, throughput numbers, and request types to know the reality of the systems and see how complex they are. Then, abstracting away these details, we’ll learn to estimate the number of RPS a server can handle. Finally, we’ll practice bandwidth, servers, and storage estimation examples.

现代系统是一个通过网络连接起来的复杂计算资源网络。不同类型的节点——比如负载均衡器、Web 服务器、应用服务器、缓存、内存数据库以及存储节点——共同为客户端提供服务。这样的系统可能采用多种架构方式，包括单体架构（monolithic architecture）、模块化单体架构（modular monolith architecture）或微服务架构（microservices architecture）。在设计层面（尤其是面试场合）过度关注这些丰富细节往往不合适，有时甚至不可能。BOTEC（“背包计算”）可以帮助我们忽略系统中那些繁琐的细节（至少在设计层面），并将注意力集中在更重要的方面，比如在计算资源方面评估服务的可行性。

以下是一些我们经常需要 BOTEC 的估算示例：

服务器可以支持的并发 TCP 连接数

Web、数据库或缓存服务器每秒可处理的请求数（RPS）

服务的存储需求

利用 BOTEC，我们可以将实际系统中不同类型服务器的具体实现细节、系统组件的访问延迟、吞吐率，以及请求类型的差异都抽象掉。接下来，我们首先会了解这些不同服务器类型、访问延迟、吞吐量以及请求类型，以便理解系统的现实情况以及它们有多复杂。然后，在抽象掉这些细节之后，我们会学习如何估算一台服务器每秒能处理多少请求（RPS）。最后，我们还将通过一些带宽、服务器和存储方面的估算示例来进行练习。

Types of data center servers

Data centers don’t have a single type of server. Enterprise solutions use commodity hardware to save costs and develop scalable solutions. Below, we discuss the types of servers that are commonly used within a data center to handle different workloads.

数据中心并不只使用单一类型的服务器。企业级解决方案通常会采用“商品化硬件”（commodity hardware）来降低成本并构建可扩展的系统。下面，我们将讨论数据中心中常见的几种类型服务器，以及它们如何应对不同的工作负载。

Screenshot 2025-02-17 at 18.17.28

An approximation of the resources required on the web, application, and storage layer of the server, where the y-axis is a categorical axis with data points indicating levels of low, medium, and high resource requirements

这张图大致展示了服务器在 Web、应用和存储层所需资源的大体情况，其中纵轴是一个分类轴，表示低、中、高三个不同层次的资源需求。

Web servers

For scalability, web servers are decoupled from application servers. Web servers are the first point of contact after load balancers. Data centers have racks full of web servers that usually handle API calls from clients. Depending on the service that’s offered, the memory and storage resources in web servers can be small to medium. However, such servers require good processing resources. For example, Facebook has used a web server with 32 GB of RAM and 500 GB of storage space in the past.

为了实现可扩展性，Web 服务器通常与应用服务器分离。Web 服务器是客户端通过负载均衡器访问后所接触的第一层。在数据中心中，往往有大量的 Web 服务器机架，用来处理客户端的 API 调用。根据所提供服务的不同，Web 服务器所需的内存和存储资源可能从小到中等不等。但这类服务器往往需要较好的处理性能。例如，Facebook 曾使用拥有 32 GB 内存和 500 GB 存储空间的 Web 服务器。

Application servers

Application servers run the core application software and business logic. The difference between web servers and application servers is somewhat fuzzy. Application servers primarily provide dynamic content, whereas web servers mostly serve static content to the client. They can require extensive computational and storage resources. Storage resources can be volatile and nonvolatile. Facebook has used application servers with a RAM of up to 256 GB and two types of storage—traditional rotating disks and flash—with a capacity of up to 6.5 TB.

应用服务器运行核心应用软件和业务逻辑。Web 服务器与应用服务器之间的区别有时并不那么明确。通常来说，应用服务器主要负责提供动态内容，而 Web 服务器则主要为客户端提供静态内容。应用服务器可能需要大量的计算和存储资源。其存储资源既可以是易失性的，也可以是非易失性的。Facebook 曾使用过的应用服务器，内存最大可达 256 GB，并配备两种类型的存储——传统机械硬盘和闪存（flash），存储容量最高可达 6.5 TB。

Storage servers

With the explosive growth of Internet users, the amount of data stored by giant services has multiplied. Additionally, various types of data are now being stored in different storage units. For instance, YouTube uses the following data stores:

Blob storage: This is used for its encoded videos.
Temporary processing queue storage: This can hold a few hundred hours of video content uploaded daily to YouTube for processing.
Bigtable: This is a specialized storage used for storing a large number of thumbnails of videos.
Relational database management system (RDBMS): This is for users’ and videos’ metadata (comments, likes, user channels, and so on).

Other data stores are still used for analytics, for example, Hadoop’s HDFS. Storage servers mainly include structured (for example, SQL) and nonstructured (NoSQL) data management systems.

Returning to the example of Facebook: they’ve used servers with a storage capacity of up to 120 TB. With the number of servers in use, Facebook is able to house exabytes of storage. (One exabyte is 10181018 bytes. By convention, we measure storage and network bandwidth in base 10, and not base 2.) However, the RAM of these servers is only 32 GB.

Note: The servers described above aren’t the only types of servers in a data center. Organizations also require servers for services like configuration, monitoring, load balancing, analytics, accounting, caching, and so on.

随着互联网用户量的爆炸式增长，大型服务所存储的数据量也成倍增加。此外，各种不同类型的数据现在也被存储在不同的存储单元里。例如，YouTube 使用了以下数据存储方式：

Blob 存储：用于存储其编码后的视频文件。

临时处理队列存储：可存放每天上传到 YouTube 用于后续处理的数百小时视频内容。

Bigtable：这是一个专门用于存储海量视频缩略图的存储系统。

关系型数据库管理系统（RDBMS）：用于存放用户和视频的元数据（评论、点赞、用户频道等）。

另外，还有一些数据存储系统专门用来做分析（例如 Hadoop 的 HDFS）。存储服务器通常包括管理结构化（如 SQL）和非结构化（如 NoSQL）数据的系统。

再以 Facebook 为例：它们曾使用过的服务器单机最高可达 120 TB 的存储容量。通过大量服务器的组合，Facebook 能够容纳 EB（艾字节级）的存储。（1 EB 等于 10^18 字节。按照惯例，我们在表示存储和网络带宽时采用的是十进制而不是二进制。）然而，这些服务器的内存通常只有 32 GB。

注意：上述服务器并不是数据中心中唯一的服务器类型。组织往往还需要其他类型的服务器来处理配置、监控、负载均衡、分析、账务、缓存等服务。

We need a reference point to ground our calculations. In the table below, we depict the capabilities of a typical server that can be used in the data centers of today:

Typical Server Specifications:

Component	Count
Processor	Intel Xeon (Sapphire Rapids 8488C)
Number of cores	64 cores
RAM	256 GB
Cache (L3)	112.5 MB
Storage capacity	16 TB

Standard numbers to remember

A lot of effort goes into the planning and implementation of a service. But without any basic knowledge of the kinds of workloads machines can handle, this planning isn’t possible. Latencies play an important role in deciding the amount of workload a machine can handle. The table below depicts some of the important numbers system designers should know in order to perform resource estimation.

Important Latencies

Component	Time (nanoseconds)
L1 cache reference	0.9
L2 cache reference	2.8
L3 cache reference	12.9
Main memory reference	100
Compress 1KB with Snzip	3,000 (3 microseconds)
Read 1 MB sequentially from memory	9,000 (9 microseconds)
Read 1 MB sequentially from SSD	200,000 (200 microseconds)
Round trip within same datacenter	500,000 (500 microseconds)
Read 1 MB sequentially from SSD with speed ~1GB/sec SSD	1,000,000 (1 milliseconds)
Disk seek	4,000,000 (4 milliseconds)
Read 1 MB sequentially from disk	2,000,000 (2 milliseconds)
Send packet SF->NYC	71,000,000 (71 milliseconds)

Remember the order of magnitude difference between different components and operations is more important than remembering the exact numbers. For example, we should know that doing IO-bound work (for example, reading 1 MB data sequentially from the SSD disk) is two orders of magnitude slower than CPU-bound work (for example, compressing 1 KB data as snzip). You might be wondering why the data sizes are different in the comparison!

As long as the data to compress is readily available to the processor from L1, L2, or L3 caches, the time to compress will be relatively consistent. The data up to the size of the L3 cache of the server (which is normally a few MBs—45 MBs for a typical server, as mentioned above) fits entirely within the cache, and therefore, compressing data up to this limit will take almost the same time. This is because the processor can quickly access the data from the cache without incurring the additional latency associated with fetching data from slower levels of memory or storage.

Apart from the latencies listed above, throughput numbers are measured as queries per second (QPS) that a typical single-server datastore can handle.

请注意，不同组件或操作之间的数量级差异，比记住精确数字更为重要。举例来说，我们需要知道进行 IO 型工作（例如从 SSD 顺序读取 1MB 数据）比进行 CPU 型工作（例如压缩 1KB 数据）要慢两个数量级左右。至于数据大小不同（1KB vs. 1MB）这一点，主要是因为当压缩数据时，如果数据直接来自于 L1/L2/L3 缓存，压缩所消耗的时间就相对固定。只要数据规模不超过服务器的 L3 缓存（通常几 MB，大约 45 MB），整个压缩过程都可以在缓存内完成，而无需额外从更慢的存储层获取数据，这也就大幅减少了额外的延迟。

除了上述延迟，关于系统吞吐量（throughput）的衡量通常使用单机数据库在一秒内能够处理的查询数（QPS, queries per second）。

Important Rates 重要吞吐量


QPS handled by MySQL	1000
QPS handled by key-value store	10,000
QPS handled by cache server	100,000–1 M

The numbers above are approximations and vary greatly depending on a number of reasons, like the type of query (point and range), the specification of the machine, the design of the database, the indexing, the load on the server, and so on.

Note: For real projects, initial designs use BOTECs similar to the ones we use in a system design interview. As the design goes through iterations for real products, we might use reference numbers from some synthetic workload that match our requests (for example, spec int for CPU characterizations and TPC-C for measuring database transactions per unit time). Initial prototypes are used to validate design-level assumptions. Later on, built-in monitoring of resources and demand is carefully analyzed to find any bottlenecks and for future capacity planning.

以上数据是近似值，会因多种因素而有巨大差异，例如查询类型（点查询 vs. 范围查询）、机器的规格、数据库的设计、索引策略、服务器当时的负载等等。

注意：在真实的项目中，最初的设计阶段也会使用类似于面试中讨论的 BOTEC（“背包计算”）方法进行估算。随着设计在实际产品中的不断迭代，我们可能会参考一些“模拟负载测试”的数值（比如：Spec Int 指标来表征 CPU 能力，TPC-C 测试来衡量数据库每单位时间的事务处理能力）来进行更准确的度量，并通过原型验证设计层面的假设。接下来，在正式环境中，还会使用内置的监控去仔细分析资源和需求，找出瓶颈，并为未来的容量规划做准备。

With reference to the throughput numbers given above, what will be your reply if an interviewer says that they think that for a MySQL database, the average count of queries per second handled is 2000?

2000 is in the same order of magnitude as 1000. If, for some use cases, we need more precision, we might use a range such as, say, 250 queries for complex queries, 1750 for simpler queries, and 1000 on average.

面试官提出，如果某个 MySQL 数据库每秒平均能处理 2000 个查询，你该如何回应？回答示例： “2000 和 1000 处于同一个数量级。对于某些场景，如果需要更细的粒度，我们可以给出一个范围值，比如：对复杂查询来说，QPS 可能只有 250；对较简单的查询则可达 1750；平均下来就是 1000 左右。”

Request types

We’ll see in the next section that while estimating the number of requests a server can handle, we don’t get into the details of what kind of requests we’re going to calculate for. But in reality, all requests are not the same. Workloads (clients’ requests) can be broadly classified into three categories: CPU-bound, memory-bound, and IO-bound.

CPU-bound requests: These primarily depend on the processor of a node. An example of a CPU-bound request is compressing 1 KB of data as snzip. From the table above, we see that this operation takes 3 microseconds.

Memory-bound requests: These are primarily bottlenecked by the memory subsystem. An example is reading 1 MB of data sequentially from the RAM of a node. From the table above, we see that such an operation takes 9 microseconds (that’s three times slower than a CPU-bound operation!).

IO-bound requests: These are primarily bottlenecked by the IO subsystem (such as disks or the network). An example is reading 1 MB of data sequentially from a disk. From the table above, we see that such an operation takes 200 microseconds (a whopping 66 times slower than CPU-bound operations!)

Similar to BOTECs, we can say that if a CPU-bound request takes XX time units to complete some work on a node, then memory-bound workloads are an order of magnitude slower (10X10X), and IO-bound workloads are two orders of magnitude slower (100X100X) than the CPU-bound workload. We do such simplifications to make any further calculations easier.

在下一节中，我们会估算一台服务器每秒可处理多少请求（RPS）。在进行该估算时，我们不会深入区分请求类型。但在真实场景里，不同请求类型带来的负载是不一样的。通常，我们会把负载（客户端请求）大致分成三类：CPU 密集（CPU-bound）、内存密集（memory-bound）、IO 密集（IO-bound）。

CPU 密集（CPU-bound）：主要受处理器性能限制。例如，用 Snzip 压缩 1KB 数据。上表显示，该操作需耗时约 3 微秒。

内存密集（memory-bound）：主要瓶颈在内存子系统。例如，从 RAM 顺序读取 1MB 数据。上表显示，该操作需耗时约 9 微秒（比 CPU 密集的操作慢了大约三倍）。

IO 密集（IO-bound）：主要瓶颈在 IO 子系统（如磁盘或网络）。例如，从磁盘顺序读取 1MB 数据。上表显示，该操作耗时约 200 微秒（比 CPU 密集操作慢了 66 倍左右）。

类似于 BOTEC 的思路，我们可以做如下简化：如果一个 CPU 密集操作耗时为 X，那么内存密集操作耗时大约是 CPU 型的 10 倍（10X），而 IO 密集操作耗时大约是 CPU 型的 100 倍（100X）。这样可以让后续计算更为简单。

Abstracting away the complexities of real system

Above, we’ve seen the complexities involved in real systems. You might have realized that considering all such complexities at the design level, especially in a limited time frame such as an interview, is impractical.

BOTECs are a valuable tool for making quick, high-level estimates and decisions in the early stages of system design or when a rapid assessment is needed. So, moving forward, we’ll learn to perform back-of-the-envelope calculations.

Screenshot 2025-02-17 at 18.43.44

A real service is complex, where requests flow through many microservices, as shown on the left side of the image (which is an abstraction of the right side)

通过前文我们可以看到，真实系统往往相当复杂。在设计层面，尤其在时间有限的场合（如系统设计面试），不可能也不现实把所有复杂性都考虑进去。

BOTEC（“背包计算”）在系统设计前期或需要快速评估时是一种极其有用的方法，可以让我们在高层次上进行快速估算并做决策。接下来，我们会学习如何进行这类简化的“背包计算”。

如图所示，一个真实的服务是复杂的：请求往往需要在数据中心内经过多种微服务的处理，这里仅做了简要抽象

Request estimation in system design

This section discusses the number of requests a typical server can handle in a second. A real request will touch many nodes in a data center for different kinds of processing before a reply can be sent back to the client, and we’ll accumulate all such work for our estimations.

The following equation calculates the CPU time to execute a program (request). For simplicity, we assume that each instruction can be executed in one clock cycle; therefore, CPI (clock cycles per instruction) is 1 in the following equation. Let’s assume the average clock rate for our servers’ processor is 3.5 GHz(3,500,000,000 cycles per second). It’s reasonable to assume that a request will consume a few million instructions for full processing. For simplicity, let’s assume that, on average, each request needs 3.5 million instructions. $$ CPU_{\text{time per program}}=Instruction_{\text{per program}} \times CPI \times CPU_{\text{time per clock cycle}} $$

Let's see that the units match on both sides of the equation. On the right side, we have the following:

$Instruction_{\text{per program}}$: This is a count of the instructions the program (request) consists of, so it has no unit.
CPI: This is a count of the clock cycles required to process one instruction, so it has no unit.
$CPU_{\text{time per clock cycle}}$: This is the time the CPU takes to complete one clock cycle, measured in seconds.

From this, we can see that on the right side, we have the result in seconds, which is the time taken by the CPU per program (request).

Now, we’ll put the assumed values in the equation above. But before that, we’ll find the CPU time per clock cycle given the CPU frequency equalling 3.5 GHz. $$ \text{Clock cycles per second for a CPU with a 3.5 GHz clock rate}=3.5\times 10^9 $$ Putting all the values together, we get: $$ CPU_{\text{time per program}}=(3.5\times10^6) \times 1 \times \frac{1}{13.5 \times 10^9} =0.001 second \ \text{Total requests a CPU executes in 1 second}=\frac{1}{10^{−3}}=1000 \ requests \ \text{Total requests a 64 core server executes in 1 second}=64000 \ requests $$ Note that by changing the assumptions (for example, the number of instructions in a request), we’ll get different final answers. In the absence of more information from these measurements, our estimates are reasonable.

Note how we avoided the complexities related to CPU, memory, or io-bound requests and system architecture. Doing so is the hallmark of BOTECs.

In the next lesson, we’ll use RPS numbers for server estimation with other resources, such as storage and network bandwidth.

当然，如果我们改变对请求需要多少指令等假设，最终的估算结果也会随之改变。但在缺少更详细测量数据的前提下，这样的估算已经算是“合理且实用”了。

请注意，我们在这个例子里没有深入区分 CPU、内存或 IO 型请求，以及系统架构的复杂性——这正是“背包计算”的特点：在适当时机忽略细节，用简化模型快速得出大致结果。

在接下来的学习中，我们会利用这些每秒请求数（RPS）的估算结果，进一步结合存储和网络带宽等资源，来讨论服务器所需的规模和配置。