Lab 3 page tables
In this lab you will explore page tables and modify them to speed up certain system calls and to detect which pages have been accessed.
Before you start coding, read Chapter 3 of the xv6 book, and related files:
kernel/memlayout.h
, which captures the layout of memory.kernel/vm.c
, which contains most virtual memory (VM) code.kernel/kalloc.c
, which contains code for allocating and freeing physical memory.
It may also help to consult the RISC-V privileged architecture manual.
To start the lab, switch to the pgtbl branch:
Speed up system calls (easy)
Some operating systems (e.g., Linux) speed up certain system calls by sharing data in a read-only region between userspace and the kernel. This eliminates the need for kernel crossings when performing these system calls. To help you learn how to insert mappings into a page table, your first task is to implement this optimization for the getpid()
system call in xv6.
When each process is created, map one read-only page at USYSCALL (a virtual address defined in memlayout.h
). At the start of this page, store a struct usyscall
(also defined in memlayout.h
), and initialize it to store the PID of the current process. For this lab, ugetpid()
has been provided on the userspace side and will automatically use the USYSCALL mapping. You will receive full credit for this part of the lab if the ugetpid
test case passes when running pgtbltest
.
Some hints:
- Choose permission bits that allow userspace to only read the page.
- There are a few things that need to be done over the lifecycle of a new page. For inspiration, understand the trapframe handling in
kernel/proc.c
.
Which other xv6 system call(s) could be made faster using this shared page? Explain how.
Solution:
在进程结构体里添加一个 struct usyscall*
类型的变量,用来保存页表地址
// kernel/proc.h
struct proc {
struct spinlock lock;
···
// these are private to the process, so p->lock need not be held.
···
char name[16]; // Process name (debugging)
struct usyscall *usyscall;
// 首先在进程结构体中添加一个struct usyscall*变量,修改kernel/proc.h 目的是方便后续物理内存的分配与回收
};
给usyscall
分配页表;
需要在该函数下的proc_pagetable()
中执行映射,完成映射关系;
在释放进程的过程中要将分配给usyscall
的page释放掉,将页表插入空闲页链表中.
解除映射关系
// kernel/proc.c
...
// Look in the process table for an UNUSED proc.
// If found, initialize state required to run in the kernel,
// and return with p->lock held.
// If there are no free procs, or a memory allocation fails, return 0.
static struct proc*
allocproc(void)
{
...
// 在kernel/proc.c的allocproc函数中,模仿trapframe page的分配方式。
// 同时将进程pid写入该内存中
if((p->usyscall = (struct usyscall *)kalloc()) == 0){
freeproc(p);
release(&p->lock);
return 0;
}
p->usyscall->pid = p->pid;
...
// An empty user page table.
p->pagetable = proc_pagetable(p);
}
// free a proc structure and the data hanging from it,
// including user pages.
// p->lock must be held.
static void
freeproc(struct proc *p)
{
if(p->trapframe)
kfree((void*)p->trapframe);
p->trapframe = 0;
if(p->pagetable)
proc_freepagetable(p->pagetable, p->sz);
// 回收该物理页,在kernel/proc.c的freeproc函数中,模仿trapframe page的回收方式
if(p->usyscall){
kfree((void*)p->usyscall);
}
p->usyscall = 0;
p->pagetable = 0;
p->sz = 0;
p->pid = 0;
p->parent = 0;
p->name[0] = 0;
p->chan = 0;
p->killed = 0;
p->xstate = 0;
p->state = UNUSED;
}
// Create a user page table for a given process, with no user memory,
// but with trampoline and trapframe pages.
pagetable_t
proc_pagetable(struct proc *p)
{
...
// 将虚拟地址USYSCALL通过页表映射到该物理页,在kernel/proc.c的proc_pagetable函数中,
// 模仿trapframe page的映射方式。 需要注意的是这里不能只设置PTE_R位,因为用户需要访问该物理页,所以还需要设置PTE_U位
if(mappages(pagetable, USYSCALL, PGSIZE,
(uint64)(p->usyscall), PTE_R | PTE_U) < 0){
uvmunmap(pagetable, USYSCALL, 1, 0);
uvmunmap(pagetable, TRAMPOLINE, 1, 0);
uvmfree(pagetable, 0);
return 0;
}
return pagetable;
}
// Free a process's page table, and free the
// physical memory it refers to.
void
proc_freepagetable(pagetable_t pagetable, uint64 sz)
{
// 取消掉USYSCALL的映射,在在kernel/proc.c的proc_freepagetable函数中
uvmunmap(pagetable, USYSCALL, 1, 0);
uvmunmap(pagetable, TRAMPOLINE, 1, 0);
uvmunmap(pagetable, TRAPFRAME, 1, 0);
uvmfree(pagetable, sz);
}
Notice that the error message can be found in the freewalk
function of kernel/vm.c
. In particular, the kernel panics due to the presence of PTE_V = 1
, which means the page is still valid!
// kernel/vm.c
// Recursively free page-table pages.
// All leaf mappings must already have been removed.
void
freewalk(pagetable_t pagetable)
{
// there are 2^9 = 512 PTEs in a page table.
for(int i = 0; i < 512; i++){
pte_t pte = pagetable[i];
if((pte & PTE_V) && (pte & (PTE_R|PTE_W|PTE_X)) == 0){
// this PTE points to a lower-level page table.
uint64 child = PTE2PA(pte);
freewalk((pagetable_t)child);
pagetable[i] = 0;
} else if(pte & PTE_V){
panic("freewalk: leaf");
}
}
kfree((void*)pagetable);
}
One window
Another window:
Apparently the problem arises from proc_freepagetable
in kernel/proc.c
. Let’s see what this function actually does. The problem is that the mapping we added earlier are not being freed. A simple tweak will do the trick:
// kernel/proc.c
...
// Free a process's page table, and free the
// physical memory it refers to.
void
proc_freepagetable(pagetable_t pagetable, uint64 sz)
{
uvmunmap(pagetable, TRAMPOLINE, 1, 0);
uvmunmap(pagetable, TRAPFRAME, 1, 0);
uvmunmap(pagetable, USYSCALL, 1, 0);
// 取消掉USYSCALL的映射,在在kernel/proc.c的proc_freepagetable函数中
uvmfree(pagetable, sz);
}
...
Which other xv6 system call(s) could be made faster using this shared page? Explain how.
Solution:
Any system call that directly or indirectly invokes the copyout
fuction will be accelerated, as it saves time on copying data. Additionally, system calls used purely for information retrieval, such as getpid
in this section, will also be faster. This is because the operation of trapping into the operating system is no longer necessary, and the corresponding data can be read in usermode instead.
Print a page table (easy)
To help you visualize RISC-V page tables, and perhaps to aid future debugging, your second task is to write a function that prints the contents of a page table.
Define a function called vmprint()
. It should take a pagetable_t
argument, and print that pagetable in the format described below. Insert if(p->pid==1) vmprint(p->pagetable)
in exec.c
just before the return argc
, to print the first process's page table. You receive full credit for this part of the lab if you pass the pte printout
test of make grade
.
Now when you start xv6 it should print output like this, describing the page table of the first process at the point when it has just finished exec()
ing init
:
page table 0x0000000087f6b000
..0: pte 0x0000000021fd9c01 pa 0x0000000087f67000
.. ..0: pte 0x0000000021fd9801 pa 0x0000000087f66000
.. .. ..0: pte 0x0000000021fda01b pa 0x0000000087f68000
.. .. ..1: pte 0x0000000021fd9417 pa 0x0000000087f65000
.. .. ..2: pte 0x0000000021fd9007 pa 0x0000000087f64000
.. .. ..3: pte 0x0000000021fd8c17 pa 0x0000000087f63000
..255: pte 0x0000000021fda801 pa 0x0000000087f6a000
.. ..511: pte 0x0000000021fda401 pa 0x0000000087f69000
.. .. ..509: pte 0x0000000021fdcc13 pa 0x0000000087f73000
.. .. ..510: pte 0x0000000021fdd007 pa 0x0000000087f74000
.. .. ..511: pte 0x0000000020001c0b pa 0x0000000080007000
init: starting sh
The first line displays the argument to vmprint
. After that there is a line for each PTE, including PTEs that refer to page-table pages deeper in the tree. Each PTE line is indented by a number of " .."
that indicates its depth in the tree. Each PTE line shows the PTE index in its page-table page, the pte bits, and the physical address extracted from the PTE. Don't print PTEs that are not valid. In the above example, the top-level page-table page has mappings for entries 0 and 255. The next level down for entry 0 has only index 0 mapped, and the bottom-level for that index 0 has entries 0, 1, and 2 mapped.
Your code might emit different physical addresses than those shown above. The number of entries and the virtual addresses should be the same.
Some hints:
- You can put
vmprint()
inkernel/vm.c
. - Use the macros at the end of the file kernel/riscv.h.
- The function
freewalk
may be inspirational. - Define the prototype for
vmprint
in kernel/defs.h so that you can call it from exec.c. - Use
%p
in your printf calls to print out full 64-bit hex PTEs and addresses as shown in the example.
For every leaf page in the vmprint
output, explain what it logically contains and what its permission bits are. Figure 3.4 in the xv6 book might be helpful, although note that the figure might have a slightly different set of pages than the init
process that's being inspected here.
// kernel/exec.c
...
int
exec(char *path, char **argv)
{
...
proc_freepagetable(oldpagetable, oldsz);
// Insert `if(p->pid==1) vmprint(p->pagetable)` in `exec.c` just before the `return argc`,
// to print the first process's page table.
if(p->pid==1){
vmprint(p->pagetable);
}
return argc; // this ends up in a0, the first argument to main(argc, argv)
bad:
...
}
...
Recall that the freewalk
function that we have seen in the previous section utilizes a recursive approach to free all page tables, which means some minor modifications for that function would suffice for our purposes:
// Recursively free page-table pages.
// All leaf mappings must already have been removed.
void
freewalk(pagetable_t pagetable)
{
// there are 2^9 = 512 PTEs in a page table.
for(int i = 0; i < 512; i++){
pte_t pte = pagetable[i];
if((pte & PTE_V) && (pte & (PTE_R|PTE_W|PTE_X)) == 0){
// this PTE points to a lower-level page table.
uint64 child = PTE2PA(pte);
freewalk((pagetable_t)child);
pagetable[i] = 0;
} else if(pte & PTE_V){
panic("freewalk: leaf");
}
}
kfree((void*)pagetable);
}
void vmprintRecursive(pagetable_t pagetable, int level)
{
if (level > 3)
{
return;
}
for (int i = 0; i < 512; i++)
{
pte_t pte = pagetable[i];
if (pte & PTE_V)
{
for (int k = 0; k < level; k++)
{
printf(" ..");
}
uint64 pa = PTE2PA(pte);
printf("%d: pte %p pa %p\n", i, pte, pa);
vmprintRecursive((pagetable_t)pa, level+1);
}
}
}
void vmprint(pagetable_t pagetable)
{
printf("page table %p\n", pagetable);
vmprintRecursive(pagetable, 1);
}
Remember to define the prototype for vmprint
in kernel/defs.h
:
// kernel defs.h
...
// vm.c
...
int copyinstr(pagetable_t, char *, uint64, uint64);
void vmprint(pagetable_t pagetable);
...
Detect which pages have been accessed (hard)
Some garbage collectors (a form of automatic memory management) can benefit from information about which pages have been accessed (read or write). In this part of the lab, you will add a new feature to xv6 that detects and reports this information to userspace by inspecting the access bits in the RISC-V page table. The RISC-V hardware page walker marks these bits in the PTE whenever it resolves a TLB miss.
Your job is to implement pgaccess()
, a system call that reports which pages have been accessed. The system call takes three arguments. First, it takes the starting virtual address of the first user page to check. Second, it takes the number of pages to check. Finally, it takes a user address to a buffer to store the results into a bitmask (a datastructure that uses one bit per page and where the first page corresponds to the least significant bit). You will receive full credit for this part of the lab if the pgaccess
test case passes when running pgtbltest
.
Some hints:
- Read
pgaccess_test()
inuser/pgtbltest.c
to see howpgaccess
is used.
// user/pgtbltest.c
void
pgaccess_test()
{
char *buf;
unsigned int abits;
printf("pgaccess_test starting\n");
testname = "pgaccess_test";
buf = malloc(32 * PGSIZE);
if (pgaccess(buf, 32, &abits) < 0)
err("pgaccess failed");
buf[PGSIZE * 1] += 1;
buf[PGSIZE * 2] += 1;
buf[PGSIZE * 30] += 1;
if (pgaccess(buf, 32, &abits) < 0)
err("pgaccess failed");
if (abits != ((1 << 1) | (1 << 2) | (1 << 30)))
err("incorrect access bits set");
free(buf);
printf("pgaccess_test: OK\n");
}
-
Start by implementing
sys_pgaccess()
inkernel/sysproc.c
. -
You'll need to parse arguments using
argaddr()
andargint()
. -
For the output bitmask, it's easier to store a temporary buffer in the kernel and copy it to the user (via
copyout()
) after filling it with the right bits. -
It's okay to set an upper limit on the number of pages that can be scanned.
-
walk()
inkernel/vm.c
is very useful for finding the right PTEs. -
You'll need to define
PTE_A
, the access bit, inkernel/riscv.h
. Consult the RISC-V privileged architecture manual to determine its value.
-
Be sure to clear
PTE_A
after checking if it is set. Otherwise, it won't be possible to determine if the page was accessed since the last timepgaccess()
was called (i.e., the bit will be set forever). -
vmprint()
may come in handy to debug page tables.
//kernel/sysproc.c
...
#ifdef LAB_PGTBL
int
sys_pgaccess(void)
{
// lab pgtbl: your code here.
uint64 startaddr;
int npage;
uint64 useraddr;
argaddr(0, &startaddr);
argint(1, &npage);
argaddr(2, &useraddr);
uint64 bitmask = 0;
uint64 complement = ~PTE_A;
struct proc *p = myproc();
for (int i = 0; i < npage; ++i) {
pte_t *pte = walk(p->pagetable, startaddr+i*PGSIZE, 0);
if (*pte & PTE_A) {
bitmask |= (1 << i);
*pte &= complement;
}
}
copyout(p->pagetable, useraddr, (char *)&bitmask, sizeof(bitmask));
return 0;
}
#endif
...
Submit the lab
Time spent
Create a new file, time.txt
, and put in a single integer, the number of hours you spent on the lab. git add and git commit the file.
Answers
If this lab had questions, write up your answers in answers-*.txt
. git add and git commit these files.
Submit
Assignment submissions are handled by Gradescope. You will need an MIT gradescope account. See Piazza for the entry code to join the class. Use this link if you need more help joining.
When you're ready to submit, run make zipball, which will generate lab.zip
. Upload this zip file to the corresponding Gradescope assignment.
If you run make zipball and you have either uncomitted changes or untracked files, you will see output similar to the following:
Inspect the above lines and make sure all files that your lab solution needs are tracked, i.e., not listed in a line that begins with ??
. You can cause git
to track a new file that you create using git add {filename}.
- Please run make grade to ensure that your code passes all of the tests. The Gradescope autograder will use the same grading program to assign your submission a grade.
- Commit any modified source code before running make zipball.
- You can inspect the status of your submission and download the submitted code at Gradescope. The Gradescope lab grade is your final lab grade.
Optional challenge exercises
- Use super-pages to reduce the number of PTEs in page tables.
- Unmap the first page of a user process so that dereferencing a null pointer will result in a fault. You will have to change
user.ld
to start the user text segment at, for example, 4096, instead of 0. - Add a system call that reports dirty pages (modified pages) using
PTE_D
.
https://blog.csdn.net/weixin_46322986/article/details/131397933