7. Paging
   - page table
     - can be very large, why?
	- e.g. 32-bit address (2^32 bytes), the size of a page is 4 KB (2^12 bytes)
     	- the number of page is (2^32/2^12 = 2^20)
     	- assume that a page table entry is 4 bytes
     	- the size of a page table is 2^20 x 4 bytes = 4MB per process
     	- hundreds of processes -> hundreds of MB for page table
      - could we increase the size of a page to reduce the size of the page table? 
	- a 32-bit address, and increase the size of a page from 4 KB to 16 KB
	- we will have 2^18 entries in a page table when a page table entry (PTE) is 4 bytes
	- what is the size of a page table?
	  - 2^18 x 4 bytes = 1MB
        - what is the problem when we increase the size of a page?
	- what are advantages when the size of a page is large?
	  - may be more efficient for disk access (block size of disk)
	  - TLB entries capture more addresses per entry -> fewer misses (why?)
    - multi-level page tables
      - turn the page table into a tree structure
      - only allocate "using" page table space
      - compact and support sparse address space
      - what is inside a multi-level page table? (see slide 8)
	- chop the page table into page-sized units
	- page directory tells where a page of the page table is
	- a number of page directory entries (PDE)
	- a page frame number (PFN), and a valid bit
     - important formula for the page table
       - virtual address space size = 2^n bytes
       - number of pages = (virtual address space size) / (page size)
       - the size of a page table = number of PTE x the size of a PTE
    - the size of the physical memory is 8GB, the size of a page is 8KB, 
      46 bit virtual address, the size of a PTE is 4 bytes
      how many levels of page tables would be required?
	- the size of a page is 8KB = 2^13 bytes
	- the size of virtual address space size is 2^46 bytes
	- PTE = 4 bytes = 2^2 bytes
	- the number of pages in a page table is 2^46 / 2^13 = 2^33 bytes
	- the size of a page table is 2^33 x 2^2 bytes = 2^35 bytes
	- keep creating the level of the page table when the size of the page table > the number of the page
	- the lst level (2^35 > 2^13)-> 2^25/2^13 = 2^22
	- the 2nd level, (2^24 > 2^13) -> 2^24 bytes / 2^13 bytes = 2^11
          the size of the page table = 2^22 x 2^2 bytes = 2^24 bytes
	- the 3rd level, the size of the page table = 2^11 x 2^2 bytes = 2^13 bytes
    - demand paging
	- reduce the loading of unnecessary pages
	- pages are loaded from disk to RAM, only when needed 
	- how does demand paging work?
	  - using the present bit in the process page table
	  - present bit indicates if the block is in the RAM or not
	  - loads the page into RAM and mark the present bit to 1
	  - removes another block from the RAM if no pages on RAM are free
	- which page that needs to be loaded from the memory?
	  - page fault is slow if we always load the data from disk to RAM when the time to use
	  - pre-paging -> predict which pages will be used and swap them into the RAM
    	- need lots of locality -> otherwise run at disk speeds
	  - if most accesses are to adata alreadly in DRAM -> good
    - page fault
	- steps of the page fault (see slide 18)
    - effective access times (EAT)
	- in a case: L1 cache: 2 cycles, L2 cache: 10 cycles, main memory: 150 cycles, disk -> 30,000,000 cycles on 3.0 GHz processor
	             98% access handled by L1 cache, 1% handled by L2 cache
		     0.99% handled by DRAM, 0.01% cause page fault
	- what is the average access latency ?
	- 0.98 x 2 + 0.01 x 10 + 0.99 x 150 + 0.0001 x 30,000,000 
          = 1.96 + 0.1 + 1.485 + 3000 = about 3000 cycles / access
	- need LOW page fault rates to sustain performance
    - page selection policy
	- when do we need to load a page?
	  - prefetch pages in advance of access
	    - hard to predict accurately
	    - mispredictions can cause useful pages to be replaced
    - Belay's anomaly
	- More pages in main memory can lead to more page faults
	- e.g. FIFO replacement
	  - reference string: A B C D A B E A B C D E
	  - 9 faults when the size of a page is 3
	  - 10 faults when the size of a page is 4
	- Adding more memory might not help for page faults in some replacement algorithms
    - Thrashing
	- if all working sets do not fit in memory
	- one hot page always replaces another
	- increasing the times of page faults
    	- how to resolve the trashing?
	  - swap out entire processes
	  - invoked when page fault rate exceeds some bound
	  - Linux invokes the out-of-memory (OOM) killer
    - inverted page tables
	- aims to reduce the access latency of the multi-level page table
	- observation
	  - the size of the physical memory is much smaller
        - the number of page table entries (PTE) = the number of physical frames
	- each PTE contains <process ID, virtual page #>
	- scan through entire page table to find a match
        - hashed inverted page table
	  - accelerate the lookup time 
	  - using hashed function (see slide 32)
    - sharing memory
	- mmap(void *start, size_t length, int prot, int flags, int fd, off_t offsize)
	  - used to save memory
	  - two processes read shared read-only data from the same pages in memory
	  - bypass expensive read, write or ioctl calls
	  - in the flag (MAP_SHARED)
	- shmget(key, size, flags)
	  - create a shared memory segment
        - shmat (shmid, addr, flags)
	  - attach shmid shared memory to address space of the calling process
	- shmdt (shmid)
	  - detach shared memory