Skip to content

What is a Database

A database is an effective technology for data management, comprising an organized collection of data stored in structured data tables. These tables are interconnected, reflecting the essential relationships between objective entities. Databases help organizations or enterprises manage various information resources scientifically. In essence, our data can be managed by a database, which provides efficient access performance.

数据库是数据管理的有效技术,是由一批数据构成的有序集合,这些数据被存放在结构化的数据表里。数据表之间相互关联,反映客观事物间的本质联系。数据库能有效地帮助一个组织或企业科学地管理各类信息资源。简而言之,我们的数据可以交给数据库来帮助我们进行管理,同时数据库能够为我们提供高效的访问性能。

During the JavaSE learning phase, we learned how to use file I/O to save data locally, allowing persistent data storage. This means data can be reloaded even after the program is reopened. However, this method becomes inconvenient when dealing with a large amount of data. Moreover, if we need to find a specific piece of data among many, we must load it into memory and search, which is cumbersome.

在JavaSE学习阶段中,我们学习了如何使用文件I/O来将数据保存到本地,这样就可以将一个数据持久地存储在本地,即使程序重新打开,我们也能加载回上一次的数据,但是当我们的数据变得非常多的时候,这样的方式就显得不太方便了。同时我们如果需要查找众多数据的中的某一个,就只能加载到内存再进行查找,这样显然是很难受的!

Databases are designed for this purpose. They allow us to quickly find desired data, easily insert, modify, and delete data. Besides these functions, databases offer more features for managing and operating data efficiently!

而数据库就是专门做这事的,我们可以快速查找想要的数据,便捷地插入、修改和删除数据,并且数据库不仅能做这些事,还能提供更多便于管理数据和操作数据的功能!

Common Databases

There are many types of commonly used databases, including but not limited to:

常见的数据库有很多种,包括但不限于:

  • MySQL - It's free, the most widely used, and an open-source database, suitable for small to medium-sized applications.
  • Microsoft SQL Server - This is a commercial database that requires payment but offers technical support. It is suitable for Windows Server environments.
  • Oracle - Also a commercial database, Oracle is used for large-scale database systems.
  • MySQL - 免费,用的最多的,开源数据库,适用于中小型
  • Microsoft SQL Server - 收钱的,但是提供技术支持,适用于Windows Server
  • Oracle - 收钱的,大型数据库系统

We will focus on learning MySQL. However, it's worth noting that most SQL statements are universally applicable across different databases, with only minor syntax differences. Therefore, learning one database system equips you with the skills to work with others.

而我们要学习的是MySQL数据,其实无论学习哪种数据库,SQL语句大部分都是通用的,只有少许语法是不通用的,因此我们只需要学习一种数据库其他的也就差不多都会了。

Install MySql

https://flaviocopes.com/mysql-how-to-install/

brew install mysql

Screenshot 2023-06-11 at 20.18.51

brew services start mysql

Screenshot 2023-06-11 at 20.33.03

root

Eve123456

Screenshot 2024-01-07 at 23.36.57

Screenshot 2024-01-07 at 23.37.42

Screenshot 2024-01-07 at 23.39.23

Screenshot 2024-01-07 at 23.39.40

In IDEA

Screenshot 2024-01-07 at 23.58.39

Screenshot 2024-01-07 at 23.59.15

Screenshot 2024-01-07 at 23.59.44

Data Models

Data models, like models in the real world, are abstractions of the characteristics of data in the real world. In fact, the classes we have previously learned about in programming are an abstraction of real-world data. For example, the characteristics of a student include name, age, grade, student ID, major, etc. These characteristics are also known as attributes of an entity. Attributes have the following features:

数据模型与现实世界中的模型一样,是对现实世界数据特征的一种抽象。实际上,我们之前学习的类就是对现实世界数据的一种抽象,比如一个学生的特征包括姓名,年龄,年级,学号,专业等,这些特征也称为实体的一种属性,属性具有以下特点:

  • Attributes are indivisible.
  • An entity can have multiple attributes.
  • The attribute used to uniquely distinguish different entities is called a Key, such as the unique student ID for each student.
  • The value of an attribute can have certain constraints, such as gender being limited to male or female.
  • 属性不可再分
  • 一个实体的属性可以有很多个
  • 用于唯一区分不同实体的的属性,称为Key,比如每个同学的学号都是不一样的
  • 属性取值可以有一定的约束,比如性别只能是男或是女

Entities or attributes can have certain relationships between them. For instance, a teacher can teach many students, and the students, in relation to the teacher, are in a taught relationship. Similarly, each student has a unique student ID, establishing a relationship between the student number and the student. A relationship where one teacher teaches many students is a one-to-many relationship (1:n), while the unique correspondence of a student number is a one-to-one relationship (1:1). Furthermore, each teacher can teach many students, and each student can have multiple teachers, which is a many-to-many relationship (n:m).

实体或是属性之间可以具有一定的联系,比如一个老师可以教很多个学生,而学生相对于老师就是被教授的关系;又比如每个同学都有一个学号与其唯一对应,因此学号和学生之间也有一种联系。而像一个老师教多个学生的联系就是一种一对多的联系(1:n),而学号唯一对应,就是一种一对一的联系(1:1);每一个老师不仅可以教多个学生,每一个学生也可以有多个教师,这就是一种多对多的联系(n:m)

MySQL is a type of relational database. By using a relational database, we can effectively store data that has certain relationships.

MySQL就是一种关系型数据库,通过使用关系型数据库,我们就可以很好地存储这样带有一定联系的数据。

p

Constructing an Entity-Relationship (ER) diagram is an excellent way to clearly understand the relationships and characteristics of different data models. An ER diagram is a visual representation of the data model, detailing how entities (like objects or concepts relevant to a database) relate to each other within a system.

通过构建一个ER图,我们就能很好地理清不同数据模型之间的关系和特点。

Database Creation

Now that we understand attributes and relationships, let's try to create a database and add tables to store data. Each table represents the data of a specific entity. First, we need to clarify what kind of tables we need to create:

Screenshot 2024-01-08 at 00.33.25

Screenshot 2024-01-08 at 00.35.12

既然了解了属性和联系,那么我们就来尝试创建一个数据库,并在数据库中添加用于存放数据的表,每一张表都代表一种实体的数据。首先我们要明确,我们需要创建什么样子的表:

  • Student Table: To store all student data. Student (Student ID (underline), Name, Gender)
  • Teacher Table: To store all teacher data. Teacher (Teacher ID (underline), Name)
  • Course Table: To store the teaching information between teachers and students. Course (Student ID, Teacher ID)
  • 学生表:用于存放所有学生的数据,学生(学号,姓名,性别)
  • 教师表:用于存放所有教师的数据,教师(教师号,姓名)
  • 授课表:用于存放教师与学生的授课信息,授课(学号,教师号)

Screenshot 2024-01-08 at 00.41.20

Screenshot 2024-01-08 at 00.43.14

Screenshot 2024-01-08 at 00.47.06

Screenshot 2024-01-08 at 00.51.28

Screenshot 2024-01-08 at 00.52.33

The underlined attributes are designated as Keys, serving as unique identifiers to distinguish each entity's data.

其中,标注下划线的属性,作为Key,用于区别于其他实体数据的唯一标记。

To make it easier to understand, let's start with graphical interface operations before moving to SQL statements. Don't worry, we'll cover everything step by step. We can use database clients like Navicat or the built-in database client in IDEA to create a database and the three tables mentioned above.

为了理解起来更加轻松,我们从图形界面操作再讲到SQL语句,请不要着急。我们现在通过Navicat或idea自带的数据库客户端来创建一个数据库和上述三个表。

Screenshot 2024-01-08 at 01.05.44

Screenshot 2024-01-08 at 01.07.43

Screenshot 2024-01-08 at 10.18.24

Screenshot 2024-01-08 at 10.19.01

Database Normalization

When designing a table to store an entity's data, it's essential to understand database normalization. The goal is to minimize the existence of "bad" relations and to design an optimal relational model. In short, we need to learn how each table should be designed.

Database normalization involves organizing a database in a way that reduces redundancy and dependency. It's a process of structuring a relational database to minimize duplication and to ensure data integrity. The normalization process involves dividing a database into two or more tables and defining relationships between the tables. This approach aims to:

  • Ensure that data is stored logically.
  • Reduce data redundancy (the same data stored in multiple places).
  • Protect data integrity (ensuring data is accurate and consistent).

By following these principles, we can create a database that is efficient, easy to manage, and scalable. In the following sections, we'll delve into the practical steps of creating these tables and applying normalization principles.

要去设计存放一个实体的表,我们就需要了解数据库的关系规范化,尽可能减少“不好”的关系存在,如何设计一个优良的关系模型是最关键的内容!简而言之,我们要学习一下每一个表该如何去设计。

First Normal Form (1NF)

The First Normal Form dictates that every column in a database must be indivisible basic data items. However, the following example demonstrates a divisible case:

第一范式是指数据库的每一列都是不可分割的基本数据项,而下面这样的就存在可分割的情况:

  • Student (Name, Phone Number)

The phone number actually includes Landline Phone and Mobile Phone, so it can be split into:

电话号码实际上包括了家用座机电话移动电话,因此它可以被拆分为:

  • Student (Name, Landline Number, Mobile Number)

Meeting the First Normal Form is the most basic requirement for relational databases!

满足第一范式是关系型数据库最基本的要求!

Second Normal Form (2NF)

The Second Normal Form requires that there must be a primary key in the table, and all other attributes must fully depend on the primary key. For example:

第二范式要求表中必须存在主键,且其他的属性必须完全依赖于主键,比如:

  • Student (Student ID, Name, Gender)

The student ID is a unique identifier for each student, and each student has a different student ID. Therefore, this table has a primary key, and all attributes of each student depend on the student ID. Changes in the student ID imply changes in the student, hence both name and gender would change accordingly. Therefore, this table meets the Second Normal Form.

学号是每个学生的唯一标识,每个学生都有着不同的学号,因此此表中存在一个主键,并且每个学生的所有属性都依赖于学号,学号发生改变就代表学生发生改变,姓名和性别都会因此发生改变,所有此表满足第二范式。

Third Normal Form (3NF)

While meeting the Second Normal Form, if all attributes are not transitively dependent on the primary key, the table satisfies the Third Normal Form.

在满足第二范式的情况下,所有的属性都不传递依赖于主键,满足第三范式。

  • Student Book Borrowing Situation (Borrowing ID, Student ID, Book ID, Book Name, Book Author)

Actually, the book ID depends on the borrowing ID, and the book name and author depend on the book ID, creating a transitive dependency. We can separate the book information into another table:

实际上书籍编号依赖于借阅编号,而书籍名称和书籍作者依赖于书籍编号,因此存在传递依赖的情况,我们可以将书籍信息进行单独拆分为另一张表:

  • Student Book Borrowing Situation (Borrowing ID, Student ID, Book ID)
  • Book (Book ID, Book Name, Book Author)

This separation eliminates the transitive dependency, thereby meeting the Third Normal Form.

这样就消除了传递依赖,从而满足第三范式。

Boyce-Codd Normal Form (BCNF)

BCNF, as a refinement of the Third Normal Form, assumes a warehouse management relationship table, StorehouseManage (Warehouse ID, Stored Item ID, Manager ID, Quantity). In this scenario, one manager works at only one warehouse; a warehouse can store multiple items. This database table has the following determining relationships:

BCNF作为第三范式的补充,假设仓库管理关系表为StorehouseManage(仓库ID, 存储物品ID, 管理员ID, 数量),且有一个管理员只在一个仓库工作;一个仓库可以存储多种物品。这个数据库表中存在如下决定关系:

(Warehouse ID, Stored Item ID) → (Manager ID, Quantity)

(Manager ID, Stored Item ID) → (Warehouse ID, Quantity)

Therefore, (Warehouse ID, Stored Item ID) and (Manager ID, Stored Item ID) are both candidate keys for StorehouseManage, with the only non-key attribute being Quantity, which complies with the Third Normal Form. However, due to the following determining relationships:

所以,(仓库ID, 存储物品ID)和(管理员ID, 存储物品ID)都是StorehouseManage的候选关键字,表中的唯一非关键字段为数量,它是符合第三范式的。但是,由于存在如下决定关系:

(Warehouse ID) → (Manager ID)

(Manager ID) → (Warehouse ID)

There is a situation where a key attribute determines another key attribute. If the Manager ID is modified, then each corresponding entry must be individually updated, indicating non-compliance with the BCNF.

即存在关键字段决定关键字段的情况,如果修改管理员ID,那么就必须逐一进行修改,所以其不符合BCNF范式。