Maintainability
What is maintainability?
Besides building a system, one of the main tasks afterward is keeping the system up and running by finding and fixing bugs, adding new functionalities, keeping the system’s platform updated, and ensuring smooth system operations. One of the salient features to define such requirements of an exemplary system design is maintainability. We can further divide the concept of maintainability into three underlying aspects:
- Operability: This is the ease with which we can ensure the system’s smooth operational running under normal circumstances and achieve normal conditions under a fault.
- Lucidity: This refers to the simplicity of the code. The simpler the code base, the easier it is to understand and maintain it, and vice versa.
- Modifiability: This is the capability of the system to integrate modified, new, and unforeseen features without any hassle.
除了构建系统之外,后续的一项主要任务是确保系统持续运行,这包括发现并修复漏洞、添加新功能、保持系统平台更新,以及确保系统平稳运行。在优秀的系统设计中,一个显著特性是 可维护性(Maintainability)。
可维护性可以进一步分为以下三个核心方面:
- 可操作性(Operability):指的是在正常情况下确保系统平稳运行的难易程度,以及在发生故障时恢复到正常状态的能力。
- 清晰性(Lucidity):指的是代码的简洁性。代码库越简单,就越容易理解和维护,反之亦然。
- 可修改性(Modifiability):指的是系统能够轻松集成修改、新功能以及不可预见的特性,而不会引发复杂问题。
Measuring maintainability
Maintainability, M
, is the probability that the service will restore its functions within a specified time of fault occurrence. M
measures how conveniently and swiftly the service regains its normal operating conditions.
For example, suppose a component has a defined maintainability value of 95% for half an hour. In that case, the probability of restoring the component to its fully active form in half an hour is 0.95.
Note: Maintainability gives us insight into the system’s capability to undergo repairs and modifications while it’s operational.
We use (mean time to repair) MTTR as the metric to measure M
.
$$
MTTR=\frac{Total \ Maintenance \ Time}{Total \ Number \ of\ Repairs}
$$
In other words, MTTR is the average amount of time required to repair and restore a failed component. Our goal is to have as low a value of MTTR as possible.
衡量可维护性 可维护性(M) 指的是在特定故障发生后,服务能够在规定时间内恢复功能的概率。M 衡量的是服务恢复到正常运行状态的便捷性和速度。
例如,假设某个组件的可维护性值为 95%,且设定的恢复时间为 30 分钟,那么意味着 该组件在 30 分钟内完全恢复正常状态的概率为 0.95。
注意: 可维护性让我们了解系统在运行时进行修复和修改的能力。
我们使用 平均修复时间(MTTR,Mean Time to Repair) 作为衡量 M 的指标。
换句话说,MTTR 是指修复并恢复故障组件所需的平均时间。我们的目标是尽可能降低 MTTR,以提高系统的维护效率。
Maintainability and reliability
Maintainability can be defined more clearly in close relation to reliability. The only difference between them is the variable of interest. Maintainability refers to time-to-repair
, whereas reliability refers to both time-to-repair
and the time-to-failure
. Combining maintainability and reliability analysis can help us achieve availability, downtime, and uptime insights.
可维护性与可靠性 可维护性 可以更清楚地通过可靠性(Reliability)来定义。两者之间的唯一区别是关注的变量不同:
可维护性 关注的是修复时间(time-to-repair)。 可靠性 关注的是修复时间(time-to-repair)和故障时间(time-to-failure)。 结合可维护性和可靠性的分析,有助于我们评估系统的可用性(availability)、宕机时间(downtime)和正常运行时间(uptime),从而优化系统性能。