Mit6.828 Lab 2: Memory Management的做题思路，以及答案。

参考资料

准备工作

切换到lab2的分支，然后将lab1的代码合入到lab2中

$git checkout -b lab2 origin/lab2
$git merge lab1

Part 1: Physical Page Management

Exercise1

第一部分只有一个Exercise，即写一个physical page allocator。 It keeps track of which pages are free with a linked list of struct PageInfo objects, each corresponding to a physical page.这句话告诉我们这个allocator的基本形式是一个链表，链表的节点是struct PageInfo objects。

boot_alloc

在内核刚初始化的时候，我们只能通过移动指针来分配内存。boot_alloc将指针从end开始，增加到end + n，表示分配了n个字节的内存。

static void *
boot_alloc(uint32_t n)
{
	static char *nextfree;	// virtual address of next byte of free memory
	char *result;

	// Initialize nextfree if this is the first time.
	// 'end' is a magic symbol automatically generated by the linker,
	// which points to the end of the kernel's bss segment:
	// the first virtual address that the linker did *not* assign
	// to any kernel code or global variables.
	if (!nextfree) {
		extern char end[];
		nextfree = ROUNDUP((char *) end, PGSIZE);
	}

	// Allocate a chunk large enough to hold 'n' bytes, then update
	// nextfree.  Make sure nextfree is kept aligned
	// to a multiple of PGSIZE.
	//
	// LAB 2: Your code here.
+++	result = nextfree;
+++ nextfree = ROUNDUP((char *) nextfree + n, PGSIZE); 

	return result;
}

mem_init

pages全局变量保存着所有页表的信息。在我的环境中，Physical memory: 131072K available, base = 640K, extended = 130432K，因此总共有32768个页表。

//////////////////////////////////////////////////////////////////////
"// Allocate an array of npages 'struct PageInfo's and store it in 'pages'."
"// The kernel uses this array to keep track of physical pages: for"
"// each physical page, there is a corresponding struct PageInfo in this"
"// array.  'npages' is the number of physical pages in memory.  Use memset"
"// to initialize all fields of each struct PageInfo to 0."
"// Your code goes here:"
+++ pages = (struct PageInfo *)boot_alloc(npages * sizeof(struct PageInfo));
+++ memset(pages, 0, sizeof(npages * sizeof(struct PageInfo)));

page_init

由于没有仔细理解struct PageInfo中pp_ref的意义，因此下面的代码都忽略了对pp_ref的赋值。(第二次提交之后，修改了这部分代码，因此对每一个被使用的页表，pp_ref都赋值为1)。pp_ref is the count of pointers to this page, for pages allocated using page_alloc.，即在分配的时候，需要将pp_ref置1，表示page_alloc时，有一个指针指向该页表。

void
page_init(void)
{
	// The example code here marks all physical pages as free.
	// However this is not truly the case.  What memory is free?
	//  1) Mark physical page 0 as in use.
	//     This way we preserve the real-mode IDT and BIOS structures
	//     in case we ever need them.  (Currently we don't, but...)
	//  2) The rest of base memory, [PGSIZE, npages_basemem * PGSIZE)
	//     is free.
	//  3) Then comes the IO hole [IOPHYSMEM, EXTPHYSMEM), which must
	//     never be allocated.
	//  4) Then extended memory [EXTPHYSMEM, ...).
	//     Some of it is in use, some is free. Where is the kernel
	//     in physical memory?  Which pages are already in use for
	//     page tables and other data structures?
	//
	// Change the code to reflect this.
	// NB: DO NOT actually touch the physical memory corresponding to
	// free pages!
	size_t i, first_free_page;
	size_t first_free_byte;

    /*
	for (i = 0; i < npages; i++) {
		pages[i].pp_ref = 0;
		pages[i].pp_link = page_free_list;
		page_free_list = &pages[i];
	}*/

    // Mark page 0 as in use
    pages[0].pp_ref = 1;

    // Mark base memory as free
	for (i = 1; i < npages_basemem; i++)
	{
	    pages[i].pp_ref = 0;
	    pages[i].pp_link = page_free_list;
	    page_free_list = &pages[i];
	}
	
	// IOPHYSMEM/PGSIZE == npages_basemem
	// Mark IO hole
	for (i = IOPHYSMEM/PGSIZE; i < EXTPHYSMEM/PGSIZE; i++)
	{
	    pages[i].pp_ref = 1;
	}
    // kernel is loaded in physical memory 0x100000, the beginning of extended memory
    // page directory entry, and npages of PageInfo structure ares allocated by 
    // boot_alloc in mem_init(). next free byte is 


    first_free_byte = PADDR(boot_alloc(0));
    first_free_page = first_free_byte/PGSIZE;

    // mark kernel and page directory, PageInfo list as in use
    for (i = EXTPHYSMEM/PGSIZE; i < first_free_page; i++)
    {
        pages[i].pp_ref = 1;
    }
    // mark others as free
    for (i = first_free_page; i < npages; i++)
    {
        pages[i].pp_ref = 0;
        pages[i].pp_link = page_free_list;
	    page_free_list = &pages[i];
    }	
		
}

page_alloc

很简单，从page_free_list链表中，取出第一个节点，即分配的空间。

struct PageInfo *
page_alloc(int alloc_flags)
{
	// Fill this function in
	if (page_free_list == NULL) {
		return NULL;
	}
	
	struct PageInfo * res = page_free_list;
	page_free_list = page_free_list->pp_link;
	res->pp_link = NULL;
	
	if (alloc_flags & ALLOC_ZERO) {
		physaddr_t page_addr = page2kva(res);
		memset((void *)page_addr, 0, PGSIZE);
	}
	
	return res;
}

page_free

很简单，将需要被释放的空间添加到链表头page_free_list。

//
// Return a page to the free list.
// (This function should only be called when pp->pp_ref reaches 0.)
//
void
page_free(struct PageInfo *pp)
{
	// Fill this function in
	// Hint: You may want to panic if pp->pp_ref is nonzero or
	// pp->pp_link is not NULL.
	if (pp->pp_ref != 0 || pp != NULL) {
		panic("page_free: arg struct PageInfo error\n");
	}
	pp->pp_link = page_free_list;
	page_free_list = pp;
}

启动结果

Part 2: Virtual Memory

Exercise 2.

Look at chapters 5 and 6 of the Intel 80386 Reference Manual, if you haven’t done so already. 主要是看5.2 and 6.4这两个章节，学习page translation和 page-based 。

Page Translation

线性地址转化为物理地址的图例如下：首先将线性地址分为3个部分，最高位的10个比特表示该地址在页表目录的位置，次高的10个比特表示该地址在页表的位置，最后从页表指向的物理地址处，偏移12个字节，就可以找到最终的物理地址了。

Page-Level Protection

基于页表的保护，主要是页表的第1和2比特。第1比特限定了程序的读写权限 1. Read-only access (R/W=0) 2. Read/write access (R/W=1) When the processor is executing at supervisor level, all pages are both readable and writable. When the processor is executing at user level, only pages that belong to user level and are marked for read/write access are writable; pages that belong to supervisor level are neither readable nor writable from user level. 第2比特将页表的权限分为两个级别Supervisor Level和User Level。 1. Supervisor level (U/S=0) -- for the operating system and other systems software and related data. 2. User level (U/S=1) -- for applications procedures and data. The current level (U or S) is related to CPL. If CPL is 0, 1, or 2, the processor is executing at supervisor level. If CPL is 3, the processor is executing at user level.

Exercise 3.

Use the xp command in the QEMU monitor and the x command in GDB to inspect memory at corresponding physical and virtual addresses and make sure you see the same data. //TODO

Question

Assuming that the following JOS kernel code is correct, what type should variable x have, uintptr_t or physaddr_t?

        mystery_t x;
        char* value = return_a_pointer();
        *value = 10;
        x = (mystery_t) value;

解答：应该是虚拟地址，对于操作系统来说，并不能对物理地址进行解引用，所有的地址都是虚拟地址。

Reference counting

当多个虚拟地址同时映射到同一个物理页面时，需要用struct PageInfo 中的pp_ref 来标记，有多少个虚拟地址同时指向这个物理页面。当pp_ref == 0时，表明这个页面可以被释放了。（通过系统调用的内存地址都是在UTOP 之下，UTOP之上的地址是在启动过程中通过boot_alloc来分配的，不必也不需要被释放）。Be careful when using page_alloc. The page it returns will always have a reference count of 0, so pp_ref should be incremented as soon as you’ve done something with the returned page (like inserting it into a page table).

Page Table Management

Exercise 4.

In the file kern/pmap.c, you must implement code for the following functions.

        pgdir_walk()
        boot_map_region()
        page_lookup()
        page_remove()
        page_insert()

check_page(), called from mem_init(), tests your page table management routines. You should make sure it reports success before proceeding.

pte_t * pgdir_walk(pde_t * pgdir, const void * va, int create)

这个函数的作用是给定一个线性地址va，页表目录pgdir，返回va在页表PTE中的地址。首先是，页表目录的地址是哪个？kern_pgdir，这个是函数pgdir_walk的第一个实参。其次是其返回值，是一个页表的地址。

pte_t *
pgdir_walk(pde_t *pgdir, const void *va, int create)
{
    //  页表中存放的是物理地址，但是页表每一项的地址却是虚拟地址
    uint32_t pdx = PDX(va);     // 页目录项索引
    uint32_t ptx = PTX(va);     // 页表项索引
    pte_t   *pde = NULL;                // 页目录项指针
    pte_t   *pte = NULL;                // 页表项指针
    struct PageInfo *pp = NULL;
    pde = &pgdir[pdx];        //获取页表的地址
    //如果页表存在，则将这个页表返回
    if (*pde && PTE_P) {
        pte = (KADDR(PTE_ADDR(*pde)));
    } else {
        if (!create) {
            return NULL;
        }
        if (!(pp = page_alloc(ALLOC_ZERO)))
        {
            return NULL;
        }
        pte = (pte_t *)page2kva(pp);
        pp->pp_ref++;
        *pde = PADDR(pte) | PTE_P | PTE_W | PTE_U;
    }
   
    return &pte[ptx];  
}

struct PageInfo * page_lookup(pde_t * pgdir, void * va, pte_t ** pte_store)

struct PageInfo *
page_lookup(pde_t *pgdir, void *va, pte_t **pte_store)
{
    pde_t * pte = pgdir_walk(pgdir, va, false);
    if (!pte) {
        return NULL;
    }
    if (pte_store) {
        *pte_store = pte;
    }
    return pa2page(PTE_ADDR(*pte));
}

这个函数查找虚拟地址va对应的实际物理地址的页表。如图所示，查找va = 0x1000应该返回pp2对应的页表。

void page_remove(pde_t * pgdir, void * va)

注意一点，那就是对*pte_store的赋值应该在tlb_invalidate之前，否则会导致取得非法地址。

void
page_remove(pde_t *pgdir, void *va)
{
	pte_t * pte_store = NULL;
	struct PageInfo * pg_info = page_lookup(pgdir, va, &pte_store);
	if (pg_info) {
		//   - The ref count on the physical page should decrement.
		//   - The physical page should be freed if the refcount reaches 0.
		page_decref(pg_info);

		//   - The pg table entry corresponding to 'va' should be set to 0.
		//     (if such a PTE exists)
		*pte_store = 0;

		//   - The TLB must be invalidated if you remove an entry from
		//     the page table.
		tlb_invalidate(pgdir, va);
	}
}

int page_insert(pde_t * pgdir, struct PageInfo * pp, void * va, int perm)

插入的时候，首先是需要查找va对应的页表地址，然后获取地址中的值，这就是之前va对应的实际物理地址所在的页表。然后进行判断，如果该页表存在，则删除之，然后将va映射到新的页表pp上。

int
page_insert(pde_t *pgdir, struct PageInfo *pp, void *va, int perm)
{
    //   - If there is already a page mapped at 'va', it should be page_remove()d.
    pde_t * pte = pgdir_walk(pgdir, va, true);
    if (!pte) {
        return -E_NO_MEM;
    }
    //   - If necessary, on demand, a page table should be allocated and inserted
    //     into 'pgdir'.
    if (*pte & PTE_P) {
        // reinsert same page to same va
        if (PTE_ADDR(*pte) == page2pa(pp)) {
            *pte = page2pa(pp) | (perm | PTE_P);
            return 0;
        } else {
            page_remove(pgdir, va);
        }
    }
    *pte = page2pa(pp) | perm | PTE_P;
    pp->pp_ref++;
    return 0;
}

Part 3: Kernel Address Space

JOS将32位的线性地址空间分为2部分，用户空间和内核空间，用户空间占据低地址，内核空间占据高地址。两者之间的分界线是inc/memlayout.h的ULIM，预留了256M的虚拟地址空间给内核。

Permissions and Fault Isolation

The user environment will have no permission to any of the memory above ULIM, while the kernel will be able to read and write this memory.
For the address range [UTOP,ULIM), both the kernel and the user environment have the same permission: they can read but not write this address range. This range of address is used to expose certain kernel data structures read-only to the user environment.
Lastly, the address space below UTOP is for the user environment to use; the user environment will set permissions for accessing this memory.

Initializing the Kernel Address Space

Exercise 5.

Fill in the missing code in mem_init() after the call to check_page().Your code should now pass the check_kern_pgdir() and check_page_installed_pgdir() checks.

Map ‘pages’ read-only by the user at linear address UPAGES。 UPAGES映射到哪个物理地址上呢？看提示，映射到数组pages。

//////////////////////////////////////////////////////////////////////
// Map 'pages' read-only by the user at linear address UPAGES
// Permissions:
//    - the new image at UPAGES -- kernel R, user R
//      (ie. perm = PTE_U | PTE_P)
//    - pages itself -- kernel RW, user NONE
// Your code goes here:
boot_map_region(kern_pgdir, UPAGES, PTSIZE, PADDR(pages), PTE_U);

内核栈的映射。并不需要映射全部的虚拟内存到物理内存。

//////////////////////////////////////////////////////////////////////
// Use the physical memory that 'bootstack' refers to as the kernel
// stack.  The kernel stack grows down from virtual address KSTACKTOP.
// We consider the entire range from [KSTACKTOP-PTSIZE, KSTACKTOP)
// to be the kernel stack, but break this into two pieces:
//     * [KSTACKTOP-KSTKSIZE, KSTACKTOP) -- backed by physical memory
//     * [KSTACKTOP-PTSIZE, KSTACKTOP-KSTKSIZE) -- not backed; so if
//       the kernel overflows its stack, it will fault rather than
//       overwrite memory.  Known as a "guard page".
//     Permissions: kernel RW, user NONE
// Your code goes here:
//boot_map_region(kern_pgdir, KSTACKTOP-PTSIZE, PTSIZE-KSTKSIZE, PADDR(bootstack), PTE_W);
boot_map_region(kern_pgdir, KSTACKTOP-KSTKSIZE, KSTKSIZE, PADDR(bootstack), PTE_W);

映射基地址

boot_map_region(kern_pgdir, KERNBASE, 0xffffffff - KERNBASE, 0, PTE_W);

Question

What entries (rows) in the page directory have been filled in at this point? What addresses do they map and where do they point? In other words, fill out this table as much as possible: 我这边取了个巧，直接通过qemu的info pages命令显示了所有的页目录和页表。

We have placed the kernel and user environment in the same address space. Why will user programs not be able to read or write the kernel’s memory? What specific mechanisms protect the kernel memory? 因为使用了页保护机制，PTE中的第1比特为读写权限，第2比特为特权级权限。因为将第2比特置为1，因此该页用户不可访问，而内核可访问。
What is the maximum amount of physical memory that this operating system can support? Why? 因为 UPAGES 最大为4M, 保存了每个页表的信息struct PageInfo，而sizeof(struct PageInfo))=8Byte, 所以 UPAGES 中最大可以存放512K页表, 每个页表 4KB, 因此最多有4MB/8B*4KB)=2GB 物理内存。
How much space overhead is there for managing memory, if we actually had the maximum amount of physical memory? How is this overhead broken down? 管理内存空间的开销在页表和页表目录。如果将页表全部使用的话，共有512K页表，每个页表需要用一个struct PageInfo和一个页表地址维护，再加上1个页表目录，总共512K * 4 + 512K * 8 + 1 * 4KB = 6MB。
Revisit the page table setup in kern/entry.S and kern/entrypgdir.c. Immediately after we turn on paging, EIP is still a low number (a little over 1MB). At what point do we transition to running at an EIP above KERNBASE? What makes it possible for us to continue executing at a low EIP between when we enable paging and when we begin running at an EIP above KERNBASE? Why is this transition necessary?（这道题的意思是：当我们在还未开启分页模式时，程序计数器EIP中存储的还是老的地址；一旦我们开启了分页模式，EIP是如何快速转换到新的地址呢？）查看kern/entry.S的代码，通过jmp *%eax来实现的。之所以在开启页表目录之后，我们还能继续执行下面的mov和jmp指令，这是因为，我们同样影射了虚拟地址[0, 4MB)到物理地址[0, 4MB)，因此这两条语句从在eip中的地址是低于4MB的，取值的时候还是映射到原来的地址。jmp *%eax是必要的，因为之后，我们会映射kern_pgdir，此时虚拟地址[0, 4MB)会失效。

lab2到此为止。后面的challenge也没有时间做了。目前发现一个问题： boot_map_region(kern_pgdir, KERNBASE, 0xffffffff - KERNBASE, 0, PTE_W);会映射到一个不存在的物理地址上，是我理解错误了吗？

总结

本次实验所有的工作都是为了完善mem_init函数，实现虚拟地址物理地址的映射。为了实现这个映射，需要搭配一些功能。首先是内存分配，boot_alloc，page_init，page_alloc，page_free，每次分配4096字节的数据，空闲链表page_free_list进行分配。其次，进行虚拟地址到物理地址的映射，pgdir_walk，page_insert，page_lookup等函数，最后虚拟地址批量映射到物理地址boot_map_region。

Tags: Mit6.828