8. Memory Allocation
   - What is the goal of the memory allocation?
     - allocate the memory space for processes
     - static (compile-time) allocation
	- Why static allocation is not always a good choice?
     - dynamic allocation 
	- stack allocation
	- heap allocation
     - stack allocation
	- the allocation happens on contiguous blocks of memory
	- temporary memory allocation: 
	  - It allocates or de-allocates the memory automatically as soon as 
            the corresponding method completes its execution
        - data stored can only be accessed by the owner thread
	- When will we use stack allocation?
	  - memory allocation and free are partially predictable
	  - example: recursion, procedure call frames
        - pros:
	  - keeps all the free space contiguous
	- cons:
	  - Not appropriate for all data structures
     - heap allocation
	- The memory is allocated during the execution of instruction written
          by programmers
	- it is called a heap because it is a pile of memory space available
          to programmers to allocate and de-allocate
	- data stored in the memory space is accessible or visible to all threads
	- memory leak would occur 
	  - programmers create a memory in heap and forget to delete it
	  - reduce the performance of computer by reducing the amount of available
            memory
	- no automatic de-allocation feature is provided
	- need to use a garbage collector to remove the old unused objects
	- not as threaded-safe as stack-memory (why?)
	- resizing is possible
        - main issue in heap memory is fragmentation
     - fragmentation
	- internal fragmentation
	  - waste space when you round an allocation up
	- external fragmentation
	  - when you end up with small chunks of free memory that are too small
            to be useful
   	  - full of little holes of free space
	  - no single contiguous space that can satisfy the request
    	  - compaction is expansive
	  - when does external fragmentation occur?
	    - the free space consists of variable-sized units
     - segregated list
	- a particular application has one or a few popular-sized request
	- keep a separate list to manage objects of that size
	- all other requests are forwarded to a more general memory allocator
	- dedicated for one particular size of requests, fragmentation is much
          less of a concern
	- no complex search of a list -> fast allocation and request
     - buddy allocator
	- fast, simple allocation for blocks that are 2^n bytes
	- allocation restrictions
	  - block size: 2^n
	- allocation strategy
	  - raise allocation request to nearest 2^n
	  - search free list for appropriate size
	    - recursively divide large blocks until reach block of correct size
	  - memory free
	    - recursively coalesce block with buddy if buddy free
	  - memory fragmentation
	    - still leads to few reserved pages
	  - has internal fragmentation? why? when? 
      - malloc() issues
 	- how to impelment malloc() or new?
	  - call sbrk() to request more contiguous memory from OS
	- separate free list for each popular size
	  - allocation is fast
	  - when will be inefficient?
      - reclaim free memory
	- when can dynamically-allocated memory be freed?
	  - explicitly call free()
	  - can't be recycled until all sharers are finished
	- two possible problems
	  - dangling pointers
	    - recycle storage while it's still being used
	    - when will this dangling pointer happen?
	      - a danging pointer as it points to the deleted object
	      - for example:
	      int main() {
		int *ptr = (int*)malloc(sizeof(int));
		int a = 560;
	 	ptr = &a;
		free(ptr);	
		return 0;
	      }
	      - after calling free(ptr), the 'ptr' becomes dangling as it is 
                pointing to the de-allocated memory
	      - if we assign the NULL value to the 'ptr', then 'ptr' will not
 		point to the deleted memory
	      - assigning NULL value to the pointer means that the pointer is not
		pointing to any memory location
------------------------------------------------------------------
	#include <stdio.h>
	int *fun() {
   	  int y = 10;
   	  return &y;
	}
	int main() {
   	  int *p = fun();
   	  printf("%d\n", *p);
   	  return 0;
	}

------------------------------------------------------------------
What is the output of the above code?
- point p is a dangling pointer, why?
- when control comes back to the context of the main(), the variable y is no longer
  available
    	   - memory leak
		- forget to free storage even when can't be used again
    - Linux kernel allocator (See slide 27)
      - page allocator	
	- allocate contiguous areas of physical pages (4k, 8k, ...)
	- buddy allocator strategy
	  - only allocations of power of two numbers of pages
	  - the allocated area is contiguous in the kernel virtual address space
	  - maps to physically contiguous pages
	- slab allocator
	  - Linux kernel has temporary kernel objects such as mm_struct, inode ...
	  - these temporary kernel objects
	    - very small and very large size
	    - often allocated and freed
	  - the allocation of small memory blocks
	    - two caches of small memory buffers
	    - kmalloc() allocates objects in these small cache buffers
	    - the caching of commonly used objects
	  - what is slab ? (See slide 31)
	    - a chunk of contiguous pages
	    - allocates a number of objects to the slabs associated with that cache
	    - cache chain
	      - a variable number of caches linked on a doubly linked circular list
	  - manages the objects in a cache
	    - a slab contains one or more pages, divided into equal-sized objects
	    - when cached created, allocate a slab, divided the slab into free objects
	    - if a slab is full of used objects, next object comes from an new/empty slab
	- vmalloc() allocator
	  - used to obtain memory zones that are contiguous in the virtual addressing space
	  - areas cannot be use for DMA, since DMA usually requires physically contiguous buffers
	  - vmalloc vs kmalloc
	    - kmalloc uses slab allocator or buddy system to ask for contiguous physical memory space
	    - vmalloc uses alloc_page to obtain the page of the order = 0, then maps to contiguous virtual memory address space, non-contiguous physical address space
    - I/O hardware
	- bus, port, controller
	- I/O access can use polling or interrupt
	- devices usually provide registers for data and control I/O
	- memory-mapped I/O
	  - data and command registers mapped to processor address space
	  - access to the I/O device registers using normal load/store instruction
	- isolated I/O (See slide 10)
	  - separate address spaces for memory and I/O devices
	  - with special instructions that access the I/O space
	  - in 32-bit address space, 8086 machines
	    - regular instructions like MOV reference RAM
	    - the special instructions IN and OUT access a separate 64 KB I/O address space

	- memory-mapped vs isolated I/O
	  - memory-mapped -> the same instructions that access memory can also access I/O devices
	  - isloated ->special instructions are used to access devices
		     -> less flexible for programming
        - programmed I/O
	  - the cpu makes a request and waits for the device to be ready
	  - large data transfer -> repeated writes words to main memory
	  - a lot of CPU time is needed
	    - if the device is slow, the CPU might have to wait for a long time
	- polling
	  - continually checking to see if a device is ready
	  - it is not an efficient use of the CPU
	  - the CPU cannot do much else while waiting
	- interrupt-driven I/O
	  - the device interrupts the processor when the data is ready
	- direct memory access (DMA)
	  - copy data directly between devices and RAM and bypass the CPU
	  - OS issues commands to the DMA controller
	  - the pointer of the command written into the command register
	  - when done, device interrupts CPU to signal completion
	  - the DMA controller is a simple processor
	  - the CPU asks the DMA controller to transfer data between a device and main memory
	  - after that, the CPU can continue with other tasks
	  - the DMA controller issues requests to the right I/O devices
	  - DMA waits and manages the transfer between the device and main memory
	  - once completed, the DMA controller interrupts the CPU
	  - constraints with a DMA
	    - a DMA deals with physical addresses
	    - need to use contiguous physical memory space