ARM7 和 ARMv7 是不一樣的東西。前者是微架構 (micro architecture) 或稱 family，後者指的是指令集架構 (instruction set architecture) 或稱 architecture。Cortex 屬微架構，實作 ARMv7 指令集。Cortex A15 加入虛擬化支援。

  * [[wp>ARM7#ARM7TDMI|ARM7TDMI]]
    * ARM7 + 16 bit Thumb + JTAG Debug + fast Multiplier + enhanced ICE 
    * 支援 ARMv4 指令集。廣泛應用於手機。

ARM 中所稱的 byte，halfword 和 word 分別為 8，16 和 32 位。

  * [[http://www.valleytalk.org/wp-content/uploads/2010/12/armx86.pdf|ARM 与 x86]]
    * [[http://www.valleytalk.org/2010/11/27/arm%E4%B8%8Ex86-wintel%E5%B8%9D%E5%9B%BD/|ARM与x86–Wintel帝国]]
    * [[http://www.valleytalk.org/2011/12/03/arm%E7%9A%84%E4%B8%89%E5%A4%A7%E5%AE%B6%E6%97%8F%E9%98%B6%E7%BA%A7%E6%88%90%E5%88%86%E5%92%8C%E7%9B%B8%E5%BA%94%E7%9A%84%E5%AF%8C%EF%BC%88%E7%A9%B7%EF%BC%89n%E4%BB%A3-%E5%88%86%E7%B1%BB%E3%80%82/|ARM的三大家族阶级成分和相应的富（穷）N代 分类]]
    * [[http://www.botskool.com/user-pages/tutorials/electronics/arm-7-tutorial-part-1|ARM 7 Tutorial - Part 1]]
    * [[http://www.informit.com/articles/article.aspx?p=1620207|Understanding ARM Architectures]]

  * [[http://itee.uq.edu.au/~esg/about/public/arm-intro.ppt|The ARM Architecture]]
  * [[http://simplemachines.it/doc/arm_inst.pdf|The ARM Instruction Set]]
  * [[http://kezeodsnx.pixnet.net/blog/post/27989054-arm-architecture|arm architecture]]

  * [[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka4127.html|8 Byte Stack Alignment]]

  * [[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/Cihidceh.html|QADD, QSUB, QDADD, and QDSUB]]: 當一般算術運算發生溢位，或是飽和(意指溢位不會發生，達到其最大/最小值就會停止)算術運算達到飽和，會將 [[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0473c/Ciheihge.html|Q flag]] 設為 1。一般算術運算或是飽和算術運算只能設置 Q flag 而不能將其清除，因此 Q flag 又稱 sticky flag。

    * [[http://www.cs.umass.edu/~trekp/cs201/lecture_notes/lecture11.txt|Lecture 11: Instruction encoding]]

  * Thumb [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/044132.html|[LLVMdev] LLC ARM Backend maintainer]]

  * [[http://www.cis.nctu.edu.tw/~wuuyang/papers/Odes7.pdf|On Static Binary Translation and On Static Binary Translation and
Optimization for ARM based Applications]]
    * [[http://www.cs.princeton.edu/~thhung/pubs/odes08.slides.pdf|On Static Binary Translation and On Static Binary Translation and 
Optimization for ARM based Applications]]

針對自修改代碼，ARM 需要做額外的工作[(http://people.cs.nctu.edu.tw/~chenwj/log/QEMU/friggle-2012-01-11.txt)]。
  * [[http://blog.csdn.net/arriod/article/details/2826959|ARM使用术语清除（flush)和清理（clean）表示对cache的两种基本操作]]
  * [[https://mail.mozilla.org/pipermail/tamarin-devel/2008-July/000826.html|ARM/Thumb cache flushing problem under Linux]]

  * 架構
  * 指令
  * 中斷/例外處理
    * user stack, kernel stack 和 interrupt stack。
    * [[http://www.ic.unicamp.br/~celio/mc404-2013/arm-manuals/ARM_exception_slides.pdf|Exception and Interrupt Handling in ARM]]
    * [[http://www.iti.uni-stuttgart.de/~radetzki/Seminar06/08_report.pdf|Exception and Interrupt Handling in ARM]] 
  * 內存系統
  * 匯流排
  * 除錯支援

====== 處理器 ======
===== 暫存器 =====
以 application level view 來看，ARM 的 register 有 16 個，分別是 R0 - R12，SP (Stack Pointer)，LR (Link Register) 和 PC (Program Counter)。這 16 個暫存器，根據 Security Extensions 是否有被實作，從 31 或 33 個暫存器中選出來。選擇方式是根據當前處理器所處的模式來決定。對於某些暫存器，ARM 會提供額外的拷貝。這些擁有額外拷貝的暫存器被稱為 banked (shadow/private) register。[[http://www.heyrick.co.uk/assembler/regs.html|Registers and Processor Modes]] 中的 "Register 8 to register 12 are general purpose registers, but they have shadow registers which come into use when you switch to FIQ mode." 代表當處理器為 FIQ (Fast interrupt request) 模式下，R8 - R12 會從 shadow register (它們的拷貝) 中選用，以此來避免破壞 user mode 中 R8 -R12 的值。

  * [[wp>ARM_architecture#Registers|ARM Register]]
    * [[http://stenlyho.blogspot.com/2008/08/arm-register.html|ARM Register]]
  * [[http://stackoverflow.com/questions/13432297/what-does-banking-a-register-mean|What does 'bank'ing a register mean?]]
    * [[http://electronics.stackexchange.com/questions/102742/what-does-banking-mean-when-applied-to-registers|What does banking mean when applied to registers?]]
    * [[http://jc.is-programmer.com/posts/28905.html|ARM 寄存器（综述）]]
    * 備份暫存器
  * CPSR (Current Program Status Register): 狀態暫存器。
  * SPSR (Saved Program Status Register): 於中斷發生時，自動儲存 CPSR 之用。
===== 系統控制協處理器 =====
[[http://infocenter.arm.com/help/topic/com.arm.doc.ddi0290g/DDI0290G_arm1156t2fs_r0p4_trm.pdf|CP15]] 又稱系統控制協處理器 (system control coprocessor)，其中有 c0 到 c15 暫存器，暫存器中的位可用來做不同的設置。透過指令 ''mrc'' 或是 ''mcr'' 讀寫 CP15 裡面的暫存器，''mrc'' 是將 CP15 (''c'') 的暫存器讀至通用暫存器 (''r''); ''mcr'' 則反之。CP15 上某些暫存器實際上有多個物理暫存器，必須透過 ''opcode_2'' 指定要存取哪一個。例如，CP15:c0 有 MIDR (Main ID Register)、CTR (Cache Type Register)、TCMTR (TCM Type Register) 等等。
<blockquote>
The system control coprocessor appears as a set of 32-bit registers that you can write to 
and read from.
</blockquote>

''mcr'' 將 CP15 (''c'') 的暫存器讀至通用暫存器 (''r'')，其指令格式為:<code>
MCR<c> <coproc>,<opc1>,<Rt>,<CRn>,<CRm>{,<opc2>}
</code>
  * ''<c>'': 此指令在何種條件下方才執行，諸如: EQ、LE、GT 等等。
  * ''<coproc>'': 欲存取的協處理器。
  * ''<opc1>'': 主操作碼。
  * ''<Rt>'': 源暫存器。
  * ''<CRn>'': 欲寫入協處理器的暫存器。
  * ''<CRm>'': 額外欲寫入協處理器的暫存器。
  * ''<opc2>'': 額外操作碼。

  * [[http://blog.csdn.net/gooogleman/article/details/3595294|ARM协处理器CP15（设置MMU，cache等）学习 ]]
  * [[http://www.upemb.com/online/ARM/1876.html|12、ARM协处理器CP15配置原理]]
  * [[http://itansuo.com/arm-cp15-mcr-mrc|关于ARM协处理器CP15及MCR和MRC指令]]
  * [[http://blog.csdn.net/zhou1232006/article/details/6150198|关于ARM9协处理器CP15及MCR和MRC指令]]
  * [[http://blog.163.com/jiangh_1982/blog/static/121950520105176025420/|超级无敌转帖(1)--《ARM的CP15协处理器的寄存器》]]
  * [[http://blog.163.com/jiangh_1982/blog/static/121950520105176134256/|超级无敌转帖(2)--《ARM的CP15协处理器的寄存器》]]
===== 系統模式 =====
  * [[wp>Fast interrupt request|FIQ (fiq)]] 是 ARM 處理器的一個模式，目的是要快速執行中斷處理常式。它是透過 private (banked) register 來減少或是避免備份 user mode 所用到的暫存器 (將其入棧)，以此加快 context switching 的速度。
    * [[http://stackoverflow.com/questions/973933/what-is-the-difference-between-fiq-and-irq-interrupt-system|What is the difference between FIQ and IRQ interrupt system?]]
  * User (usr)
  * IRQ (irq)
  * Supervisor (svc)
  * Abort mode (abt)
  * System (sys)
  * Undefined (und)

  * [[http://wiki.osdev.org/ARM_Overview|ARM Overview]]
===== 浮點和向量指令 =====
<code c>
#include <stdio.h>
#include "arm_neon.h"

void print_uint8 (uint8x16_t data, char* name) {
  int i;
  static uint8_t p[16];

  vst1q_u8 (p, data);
  printf ("%s = ", name);
  for (i = 0; i < 16; i++) {
    printf ("%02d ", p[i]);
  }
  printf ("\n");
}

int main() {
  const uint8_t uint8_data1[] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16}; // 8 x 16 = 128 bit
  const uint8_t uint8_data2[] = {16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1};
  
  uint8x16_t data_a; // uint8x16_t 是 NEON 自訂的型別，代表 8 x 16 的 vector register (每個元素 8 bit，共 16 個元素)。
  data_a = vld1q_u8(uint8_data1); // vld1q_u8 將準備好的資料讀進 vector register。vld1q_u8 中的 q 代表 vector register 為 128 bit。

  uint8x16_t data_b;
  data_b = vld1q_u8(uint8_data2);

  uint8x16_t data_c;
  print_uint8(data_a, "data_a");
  print_uint8(data_b, "data_b");
  
  data_c = vaddq_u8(data_a, data_b); // 將兩個 8 x 16 的 vector register 相加。
  
  print_uint8(data_c, "data_c");

  return 0;
}
</code>
<code bash>
# http://www.mentor.com/embedded-software/sourcery-tools/sourcery-codebench/lite-edition
$ arm-none-linux-gnueabi-gcc \
  -mfloat-abi=softfp -mfpu=neon \
  -mcpu=cortex-a8 -ftree-vectorize \
  -ffast-math -static \
  neon.c -o neon 
</code>
  * [[http://www.arm.com/products/processors/technologies/vector-floating-point.php|Floating Point]]: VFPv3 can be implemented with either thirty-two or sixteen double word registers. The terms VFPv3-D32 and VFPv3-D16 are used to distinguish between these two implementation options.
  * [[http://www.arm.com/products/processors/technologies/neon.php|NEON]]
    * Using NEON Support
  * [[http://www.cs.nctu.edu.tw/~chenwj/slide/ARM/ARM%20NEON%20-%20Poki.pptx|ARM Architecture & NEON]]
  * [[http://www.armadeus.com/wiki/index.php?title=NEON_HelloWorld|NEON HelloWorld]]
  * [[http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html|6.54.3 ARM NEON Intrinsics]]
  * [[http://adt.cs.upb.de/quf/quf11/quf2011_12.pdf|QEmu TCG Enhancements for Speeding-up the Emulation of SIMD instructions]]

  * [[http://software.intel.com/en-us/blogs/2012/12/12/from-arm-neon-to-intel-mmxsse-automatic-porting-solution-tips-and-tricks|From ARM NEON to Intel MMX&SSE- the automatic porting solution, tips and tricks]]
  * [[http://maciku.blogspot.tw/2010/07/ios-arm.html|iOS 開發者應該知道的 ARM 結構]]
    * [[http://wanderingcoder.net/2010/07/19/ought-arm/|A few things iOS developers ought to know about the ARM architecture]]
  * [[http://wanderingcoder.net/2010/06/02/intro-neon/|Introduction to NEON on iPhone]]

    - Programmer Model
      - register file of 16 Q (for Quadword) vectors, sixteen 128-bit vector registers, named q0 to q1.
      - register file of 16 Q (for Quadword) vectors can also be seen as thirty-two 64-bit D (Doubleword) vectors, named d0 to d31
    - Syntax
      - All NEON instructions, even loads and stores, begin by "V".
      - Instructions can have one (or more) letter just after the V which acts as a modifier.
        - Q means the instruction saturates
        - R that it rounds
        - H that it halves
      - all instructions need to take a suffix telling the individual size and type of the elements being operated on
        - from .u8 (unsigned byte) to .f32 (single-precision floating-point)

  * DDI0406B 
    * A2.6 Advanced SIMD and VFP extensions

  * [[http://infocenter.arm.com/help/topic/com.arm.doc.dht0002a/DHT0002A_introducing_neon.pdf]]
  * [[http://blogs.arm.com/software-enablement/161-coding-for-neon-part-1-load-and-stores/]]
===== big.LITTLE =====
  * [[wp>ARM big.LITTLE]]
<blockquote>
The other way to support big.LITTLE systems is to have all CPUs, both big and LITTLE, visible in a multiprocessor configuration. This approach offers greater flexibility, but also poses special challenges for the Linux kernel. For example, the scheduler assumes that CPUs are interchangeable, which is anything but the case on big.LITTLE systems. 
</blockquote>
  * [[http://lwn.net/Articles/481055/|Linux support for ARM big.LITTLE]]
  * [[http://lwn.net/Articles/482344/|The Linaro Connect scheduler minisummit]]
  * [[http://lwn.net/Articles/501501/|A big.LITTLE scheduler update]]
  * [[http://loda.hala01.com/2012/08/linux-kernel-heterogeneous-multi-processor-scheduling%e7%ad%86%e8%a8%98/|Linux Kernel Heterogeneous Multi-Processor Scheduling筆記]]
  * [[http://www.valleytalk.org/2011/12/03/big-little-computing-%E3%80%82-cortex-a7-%E3%80%82cortex-a15/|big.LITTLE Computing]]
  * [[http://www.valleytalk.org/2011/12/04/nvidia-tegra3kal-el-5-cores-vsmp/|NVIDIA Tegra3(Kal-El)]]
===== 原子指令 =====
  * [[http://www.jonmasters.org/blog/2012/11/13/arm-atomic-operations/|ARM atomic operations]]
  * [[http://stackoverflow.com/questions/15491751/real-life-use-cases-of-barriers-dsb-dmb-isb-in-arm|Real-life use cases of barriers (DSB, DMB, ISB) in ARM]]
    * [[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/CIHGHHIE.html|DMB, DSB, and ISB]]
===== 其它 =====
  * [[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0211h/Chdhbjjb.html|Tightly-coupled memory]]
    * [[https://www.kernel.org/doc/Documentation/arm/tcm.|ARM TCM (Tightly-Coupled Memory) handling in Linux]]
    * 提供可透過軟體控制的快取。實際上是把特定內存區塊做為快取使用。
    * 在同時擁有硬體快取和軟體快取 (TCM) 的情況下，處理器如果是用物理位址，則可以透過物理位址決定查找硬體快取或是軟體快取; 如果是用虛擬位址，則虛擬位址必需要有一段保留段，以便區分該從硬體快取或是軟體快取抓取資料。
  * [[wp>Barrel shifter]]: ARM 的算術邏輯單元的第二個運算元會先經過 barrel shifter 進行移位和旋轉操作，再送入算術邏輯單元。ARM 不單獨提供移位指令。某些運算不能利用 barrel shifter，如: MUL，CLZ 和 QADD。
    * [[http://www.davespace.co.uk/arm/introduction-to-arm/barrel-shifter.html|Barrel Shifter]] 
    * [[http://computing.unn.ac.uk/staff/cgmb3/teaching/CM506/ARM_Assembler/AssemblerSummary/BARREL.html|The ARM Barrel Shifter]]
    * [[http://stackoverflow.com/questions/7605636/why-some-arm-instructions-do-not-use-barrel-shifter|Why some ARM instructions do not use barrel shifter?]]


====== 軟體相關 ======
  * [[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0203j/BCGDIHEC.html|Interworking examples]]
    * [[http://twins.ee.nctu.edu.tw/courses/ip_core_01/lab_hw_pdf/lab_1.pdf|ARM/Thumb Interworking]]
    * [[http://www.davespace.co.uk/arm/introduction-to-arm/interworking.html|Introduction to ARM: Interworking]]
    * [[http://www.go-gddq.com/down/2012-03/12031620106201.pdf|ARM 处理器中 ARM 和 Thumb 状态的切换（Interworking）]]
    * ARM 指令和 Thumb 指令的互操作 (interwork)。針對效能要求不高的部分，可以採用 Thumb 指令撰寫。
    * Thumb 指令最終會被解碼成 ARM 指令並執行。
    * [[https://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html|3.17.4 ARM Options]] 
      * ''-mthumb-interwork'': 當鏈結器偵測到需要切換 ARM/Thumb 模式時，會插入必要代碼 (interworking veneer) 負責模式切換。<q>Generate code that supports calling between the ARM and Thumb instruction sets. Without this option, on pre-v5 architectures, the two instruction sets cannot be reliably used inside one program. The default is -mno-thumb-interwork, since slightly larger code is generated when -mthumb-interwork is specified. In AAPCS configurations this option is meaningless. </q> 
      * 手動切換 ARM/Thumb 模式，是透過 BX/BLX 指令。
  * 軟中斷 (SWI)
  * 條件執行 (Conditional Execution)
====== 範例 ======
  * 字面常量池
    * [[wp>Literal pool]][(http://www.cs.nctu.edu.tw/~chenwj/log/zakk-literal-pool-2011-10-05.txt)]
    * [[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0473c/Bgbccbdi.html|Literal pools]]
    * [[http://benno.id.au/blog/2009/01/02/literal-pools|The trouble with literal pools]]<code bash>
$ cat hello.c
#include <stdio.h>

int main() {
  printf("hello!\n");
}
$ gcc -static hello.c -o hello
$ objdump -d hello
00008260 <main>:
    8260:       e1a0c00d        mov     ip, sp
    8264:       e92dd800        push    {fp, ip, lr, pc}
    8268:       e24cb004        sub     fp, ip, #4
    826c:       e24dd008        sub     sp, sp, #8
    8270:       e59f0008        ldr     r0, [pc, #8]    ; 8280 <main+0x20>
    8274:       fa000263        blx     8c08 <_IO_puts>
    8278:       e24bd00c        sub     sp, fp, #12
    827c:       e89da800        ldm     sp, {fp, sp, pc}
    8280:       0004c354        .word   0x0004c354
# 會去掉 .word
$ strip hello
</code>
====== 系統呼叫 ======
ARM 將系統呼叫視為例外。''swi'' 是舊式系統呼叫，系統呼叫號為 ''swi'' 的參數; ''svc'' 是新式系統呼叫，系統呼叫號存放在 ''r7'' 暫存器[(http://people.cs.nctu.edu.tw/~chenwj/log/QEMU/pm215-2012-07-06.txt)]。
<code>
IN:
0x40010820:  e1a0c007      mov  ip, r7
0x40010824:  e3a0707a      mov  r7, #122        ; 0x7a
0x40010828:  ef000000      svc  0x00000000      ; 系統號為 0x7a。

OP:
 ---- 0x40010820
 mov_i32 tmp5,r7
 mov_i32 r12,tmp5

 ---- 0x40010824
 movi_i32 tmp5,$0x7a
 mov_i32 r7,tmp5

 ---- 0x40010828
 movi_i32 pc,$0x4001082c
 movi_i32 tmp5,$0x2
 movi_i64 tmp6,$exception
 call tmp6,$0x0,$0,tmp5

----------------
IN:
0xffff0008:  e59ff410      ldr  pc, [pc, #1040] ; 0xffffffffffff0420。 swi/svc vecotr 位在 0x08，除非 high vector 被設置，否則 0x08 會被置換成 0xffff0008。查表跳轉至該系統號的處理函式。

----------------
IN:
0xc0022e60:  e24dd048      sub  sp, sp, #72     ; 0x48
0xc0022e64:  e88d1fff      stm  sp, {r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, sl, fp, ip}
0xc0022e68:  e28d803c      add  r8, sp, #60     ; 0x3c
0xc0022e6c:  e9486000      stmdb        r8, {sp, lr}^    ; 將暫存器 sp 和 lr 其值寫至 r8 所指內存位址。
0xc0022e70:  e14f8000      mrs  r8, SPSR                 ; 將 spsr 賦值給暫存器。方向: r (register) <- s (spsr)。
0xc0022e74:  e58de03c      str  lr, [sp, #60]
0xc0022e78:  e58d8040      str  r8, [sp, #64]
0xc0022e7c:  e58d0044      str  r0, [sp, #68]
0xc0022e80:  e3a0b000      mov  fp, #0  ; 0x0
0xc0022e84:  e3180020      tst  r8, #32 ; 0x20
0xc0022e88:  13a0a000      movne        sl, #0  ; 0x0
0xc0022e8c:  051ea004      ldreq        sl, [lr, #-4]
0xc0022e90:  e59fc0a8      ldr  ip, [pc, #168]  ; 0xffffffffc0022f40
0xc0022e94:  e59cc000      ldr  ip, [ip]
0xc0022e98:  ee01cf10      mcr  15, 0, ip, cr1, cr0, {0} ; 將暫存器值寫入協處理器。方向: c (coprocessor) <- r (register)。

----------------
IN:
0xc0022e9c:  e321f013      msr  CPSR_c, #19     ; 0x13。設置 cpsr 的 control bit 為 19。方向: s (status register) <- r (register)。
</code>

  * [[http://stackoverflow.com/questions/8459279/are-arm-instructuons-swi-and-svc-exactly-same-thing|Are ARM instructuons SWI and SVC exactly same thing?]]


====== 內存 ======
ARM 將內存分為底下幾種 ([[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0363e/Babcddgd.html|7.2. Memory types]]):
  * Normal: 存放代碼和資料的內存空間。
  * Device: 即 MMIO，透過此塊內存空間與外設溝通。
  * Strongly Ordered: 針對此塊內存空間的存取，必須依照 program order 依次進行，不可被亂序。此塊內存理所當然只能是共享。

上述類似又可分為共享 (Shared) 和非共享 (Non-shared)，依照該內存空間是否可被多個處理器存取。


  * [[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka4158.html|Why have memory types been defined in the ARM V6 ISA ?]]
===== 位址轉換 =====
ARM 的位址轉換涉及底下三種位址:
  * 虛擬位址 (Virtual Address，VA): CPU 送出的位址。
  * 修改後的虛擬位址 (Modified Virtual Address，MVA): 快取和 MMU 看到的是 MVA。
  * 物理位址 (Physical Address，PA): 內存看到的是 PA。

禁用 MMU 的情況下，VA 等同於 PA，不經過轉換直接以該位址存取內存。啟用 MMU 的情況下，視欲存取的 VA 是否超過 32M 而有所不同，公式如下: <code c>
    if (address < 0x02000000)
        address += env->cp15.c13_fcse;
</code>
  * [[http://www.sectop.com/post/83.html|嵌入式Linux学习笔记（四）-内存管理单元 mmu]]

位址轉換過程中的 MVA，是為了加快上下文切換的速度。請見 [[http://blog.csdn.net/dayong1001/article/details/6894414|ARM 学习笔记（四） 快速上下文切换（FCSE）技术]] 一文。一般來說，上下文切換必須將頁表指針指向新進程的頁表，這代表虛擬位址到物理位址的映射有所改變，需要將 TLB 和快取的內容清空，之後再將新進程的映射和資料載入 TLB 和快取。快速上下文切换 (Fast Context Switch Extension，FCSE) 就是用來避免前述開銷。FCSE 的原理如下:

通常情况下，如果两个进程占用的虚拟地址空间有重叠，系统在这两个进程之间进行切换时，必须进行虚拟地址到物理地址的重映射。而虚拟地址到物理地址的重映射涉及到重建 MMU 中的页表，而且快取及 TLB 中的内容都必须使无效（通过设置协处理器寄存器的相关位）。这些操作将带类巨大的系统开销，一方面重建 MMU 和使无效快取及 TLB 的内容需要很大的开销，另一方面重建快取和 TLB 内容也需要很大的开销。

<color red>FCSE 的引入避免了这种系统开销。它位于 CPU 和 MMU 之间。</color>如果两个进程使用了同样的虚拟地址空间，则对 CPU 而言，两个进程使用了同样的虚拟地址空间；快速上下文切换机构对各进程的虚拟地址进行变换，这样的系统中除了 CPU 之外的部分看到的是经过快速上下文切换机构变换的虚拟地址 (MVA)。快速上下文切换机构将个进程的虚拟地址空间变换成不同的虚拟地址空间。这样在进行进程间切换时就不需要进行虚拟地址到物理地址的重映射。<color red>因為快取和 TLB 看到的是不同的 MVA。</color>

ARM 系统中，4GB 的虚拟空间被分成了 128 个进程空间块，每一个进程空间块大小为 32MB。每个进程空间块中可以包含一个进程，该进程可以使用虚拟地址空间0x0~0x01FFFFFF, 这个地址范围也就是 CPU 看到的进程的虚拟空间。系统 128 个进程空间块的编号 0~127， 标号为 X 的进程空间块中的进程实际使用的虚拟地址空间为（X * 0x02000000）到（X * 0x02000000 + 0x01FFFFFF），这个地址空间是系统中除了 CPU 之外的其他部分看到的该进程所占用的虚拟地址空间，亦即 MVA。

<color red>系统中，每个进程都使用虚拟地址空间 0x0~0x01FFFFFF</color>, 当进程访问本进程的指令和数据时，它产生虚拟地址 (VA) 的高7位为 0；快速上下文切换机构用该进程的进程标示符 (CP15:c13) 代替 VA 的高 7 位，从而得到变换后的虚拟地址 MVA，这个 MVA 在该进程对应的进程空间块内。

当 VA 的高 7 位不全是 0 时，MVA = VA。这种 VA 是本进程用于访问别的进程中的数据和指令的虚拟地址，注意这时被访问的进程标识符不能为 0。
===== 權限控制 =====
  * Memory Protection Unit (MPU)
    * [[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0337e/BIHJJABA.html|Chapter 9. Memory Protection Unit]]  

針對具備有 MMU 的 ARM 平台來說，可啟用虛擬內存。ARM 有兩層頁表。內存存取權限控制主要由域 (domain) 決定，再由 access permission (AP) 決定。透過 CP15:c1 啟用/禁用 MMU，CP15:c2 存放 TTB (Translation Table Base) 也就是頁表的位址，CP15:c2 有 TTBR0 和 TTBR1，分別存放用戶態和內核態的頁表基址 (內存物理位址)。ARM 可將內存劃分至多 16 個域，每個域可以設置權限，透過設置 CP15:c3，每個域各佔 2 位。CP15:c5 存放頁缺失原因。CP15:c6 存放造成頁缺失的虛擬位址。ARMARM 中的 B3.12.1 Organization of the CP15 registers in a VMSA implementation 會以圖示列出 CP15 中與虛擬內存相關的暫存器及其意義。在 ARM 中，除了以頁作為內存分配單位外，還可以較大的段 (section) 做為分配單位。可以針對頁中的子頁 (subpage) 設置存取權限。

<blockquote>
The translation properties associated with each translation table entry include:

Memory access permission control

This controls whether a program has access to a memory area. The possible settings are no 
access, read-only access, or read/write access. In addition, there is control of whether code 
can be executed from the memory area.
If a processor attempts an access that is not permitted, a memory abort is signaled to the 
processor.
The permitted level of access can be affected by:

  * whether the program is running in User mode or a privileged mode
  * the use of domains.

Memory region attributes

These describe the properties of a memory region. The top-level attribute, the Memory type, 
is one of Strongly-ordered, Device, or Normal. Device and Normal memory regions have 
additional attributes, see Summary of ARMv7 memory attributes on page A3-25.

Virtual-to-physical address mapping

The VMSA regards the address of an explicit data access or an instruction fetch as a Virtual 
Address (VA). The MMU maps this address onto the required Physical Address (PA).
VA to PA address mapping can be used to manage the allocation of physical memory in 
many ways. For example:

  * to allocate memory to different processes with potentially conflicting address maps
  * to enable an application with a sparse address map to use a contiguous region of physical memory.
</blockquote>

[[http://blog.21ic.com/user1/7903/archives/2011/84661.html|ARM MMU工作原理剖析]]一文搭配範例，對 MMU 有相當清楚的描述。參與權限檢查的有底下幾個元素:
  * CP15:c3 - [[http://www.amazon.com/ARM-System-Developers-Guide-Architecture/dp/1558608745|ARM System Developer's Guide: Designing and Optimizing System Software]] 第 511 頁列出 CP15:c3 中各個域可填的值其代表意義。
    * 00：当前级别下，该内存区域不允许被访问，任何的访问都会引起一个 domain fault。
    * 01：当前级别下，该内存区域的访问必须配合该内存区域的段描述符中 AP 位进行权检查
    * 10：保留状态（我们最好不要填写该值，以免引起不能确定的问题）
    * 11：当前级别下，对该内存区域的访问都不进行权限检查。
  * PTE 中的 AP 和 Domain 位 - [[http://www.amazon.com/ARM-System-Developers-Guide-Architecture/dp/1558608745|ARM System Developer's Guide: Designing and Optimizing System Software]] 第 512 頁列出 AP 位如何配合 CP15:c1 中的 S 和 R 位，得到存取權限。
    * Domain 位用來索引 CP15:c3 中 16 個域其中之一，根據其設置再決定 AP 位是否啟用。AP 位需要配合 CP15:c1 中的 S 和 R 位，再視當時是 Supervisor 模式或是 User 模式，才能得到最終訪問權限。 
  * CP15:c1 中的 S 和 R 位
  * CP15:c5 存放頁缺失原因
  * CP15:c6 存放造成頁缺失的虛擬位址

域 ([[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0360e/BABJDFFH.html|domain]]) 主要用來加速上下文切換，另一種方式是 FCSE。[[http://www.cse.unsw.edu.au/~cs9244/06/seminars/08-leonidr.pdf|The ARM Architecture]] 一文中提及:
<blockquote>
One major concern associated with memory protection is the cost of address
space switching. On ARM a context switch requires switching page tables. The
complete cost of page table switch includes the cost of ﬂushing page tables,
purging TLBs and caches and then reﬁlling them. Two mechanisms were introduced
to enable operating system designers eliminate this cost in some cases.
The first mechanism is protection domains. Every virtual memory page or section
belongs to one of sixteen protection domains. At any point in time, the
running process can be either a manager of a domain, which means that it can
access all pages belonging to this domain bypassing access permissions, a client
of the domain, which means that is can access pages belonging to the domain
according to their page table access permission bits, or can have no access to
the domain at all. <color red>In some situations, it is possible to do context switch by
simply changing domain access permissions, which means simply writing a new
value to the domain access register of coprocessor 15.</color>
</blockquote>

頁表項可以有底下控制位 ([[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0211k/Caceaije.html|6.5.2. Access permissions]] 和 [[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0333h/Babifihd.html|6.6.1. C and B bit, and type extension field encodings]]):
  * AP 位: 控制讀/寫權限。
  * Access Permissions eXtension (APX) 位: 在支援 APX 的平台上，AP 和 APX 位，以及 CP15:c1 中的 S 和 R 位決定存取權限。
  * Not-Global (nG) 位: 決定此項映射是否為進程獨有，若是進程獨有，在將此映射載入 TLB 時，會附上其 ASID (Application Space IDentifier)。
  * Shared (S) 位: 
  * Execute-Never (XN) 位: 決定該頁是否可以被執行。 
  * TEX (Type Extension) 位:
  * Cacheable (C) 位: 決定對該頁的存取是否要經過快取，以及是否採用寫穿 (write through) 或是寫回 (write back)。
  * Bufferable (B) 位: 決定是否啟用 write buffer。

注意! QEMU 是一種無快取的 ARM 實現，不需考慮 C 和 B 位 [(http://people.cs.nctu.edu.tw/~chenwj/log/QEMU/pm215-2012-08-03.txt)]。

  - ''disas_cp15_insn'' (''target-arm/translate.c'')。<code c>
/* Disassemble system coprocessor (cp15) instruction.  Return nonzero if
   instruction is not defined.  */
static int disas_cp15_insn(CPUARMState *env, DisasContext *s, uint32_t insn)
{
    ... 略 ...

    tmp2 = tcg_const_i32(insn);
    if (insn & ARM_CP_RW_BIT) {
        tmp = tcg_temp_new_i32();
        gen_helper_get_cp15(tmp, cpu_env, tmp2);
        /* If the destination register is r15 then sets condition codes.  */
        if (rd != 15)
            store_reg(s, rd, tmp);
        else
            tcg_temp_free_i32(tmp);
    } else {
        tmp = load_reg(s, rd);
        gen_helper_set_cp15(cpu_env, tmp2, tmp);
        tcg_temp_free_i32(tmp);
        /* Normally we would always end the TB here, but Linux
         * arch/arm/mach-pxa/sleep.S expects two instructions following
         * an MMU enable to execute from cache.  Imitate this behaviour.  */
        if (!arm_feature(env, ARM_FEATURE_XSCALE) ||
                (insn & 0x0fff0fff) != 0x0e010f10)
            gen_lookup_tb(s);
    }
    tcg_temp_free_i32(tmp2);
    return 0;
}
</code>
  - ''helper_set_cp15'' (''target-arm/helper.c'') 設置 CP15:c2 指向頁表。<code c>
void HELPER(set_cp15)(CPUARMState *env, uint32_t insn, uint32_t val)
{
    int op1;
    int op2;
    int crm;

    op1 = (insn >> 21) & 7;
    op2 = (insn >> 5) & 7;
    crm = insn & 0xf;
    switch ((insn >> 16) & 0xf) {

    ... 略 ...

    case 2: /* MMU Page table control / MPU cache control.  */
        if (arm_feature(env, ARM_FEATURE_MPU)) {
            switch (op2) {
            case 0:
                env->cp15.c2_data = val;
                break;
            case 1:
                env->cp15.c2_insn = val;
                break;
            default:
                goto bad_reg;
            }
        } else {
            switch (op2) {
            case 0:
                env->cp15.c2_base0 = val;
                break;
            case 1:
                env->cp15.c2_base1 = val;
                break;
            case 2:
                val &= 7;
                env->cp15.c2_control = val;
                env->cp15.c2_mask = ~(((uint32_t)0xffffffffu) >> val);
                env->cp15.c2_base_mask = ~((uint32_t)0x3fffu >> val);
                break;
            default:
                goto bad_reg;
            }
        }
        break;

        ... 略 ...

    }
    return;
bad_reg:
    /* ??? For debugging only.  Should raise illegal instruction exception.  */
    cpu_abort(env, "Unimplemented cp15 register write (c%d, c%d, {%d, %d})\n",
              (insn >> 16) & 0xf, crm, op1, op2);
}        
</code>

  * [[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0333h/I1029222.html|Chapter 6. Memory Management Unit]]
  * [[http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344i/DDI0344I_cortex_a8_r3p1_trm.pdf|Cortex-A8 Technical Reference Manual]]
  * [[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0001a/CHDEFAGB.html|Architectures, Processors, and Devices Development Article]]
  * [[http://wenku.baidu.com/view/dff657d776a20029bd642d79.html|第六章 ARM存储系统]]
  * [[http://www.embedded-bits.co.uk/2011/mmutheory/|Turning on an ARM MMU and Living to tell the Tale: Some Theory]]
  * [[http://www.embedded-bits.co.uk/2011/mmucode/|Turning on an ARM MMU and Living to tell the tale: The code]]
  * [[http://www.cs.rutgers.edu/~pxk/416/notes/09-memory.html|Memory Management]]
  * [[http://www.cs.rutgers.edu/~pxk/416/notes/09a-paging.html|Memory Management: Paging]]
  * [[http://loda.hala01.com/2011/06/arm%E8%88%87cortex%E7%AD%86%E8%A8%98-arm-mpcore-multi-processor-core-%E6%9E%B6%E6%A7%8B%E8%A7%A3%E6%9E%90/|ARM與Cortex筆記-ARM MPCore (Multi-Processor Core) 多核心架構解析]]
  * [[http://pankaj-techstuff.blogspot.tw/2007/12/initialization-of-arm-mmu-in-linux.html|Arm MMU in linux: page table Initialization and tweaks for integeration with Memory management code]]

  - ''cpu_arm_handle_mmu_fault'' (''target-arm/helper.c'')。<code c>
int cpu_arm_handle_mmu_fault (CPUARMState *env, target_ulong address,
                              int access_type, int mmu_idx)
{
    uint32_t phys_addr;
    target_ulong page_size;
    int prot;
    int ret, is_user;

    is_user = mmu_idx == MMU_USER_IDX;
    ret = get_phys_addr(env, address, access_type, is_user, &phys_addr, &prot,
                        &page_size);
    if (ret == 0) {
        /* Map a single [sub]page.  */
        phys_addr &= ~(uint32_t)0x3ff;
        address &= ~(uint32_t)0x3ff;
        tlb_set_page (env, address, phys_addr, prot, mmu_idx, page_size);
        return 0;
    }

    if (access_type == 2) {
        env->cp15.c5_insn = ret;
        env->cp15.c6_insn = address;
        env->exception_index = EXCP_PREFETCH_ABORT;
    } else {
        env->cp15.c5_data = ret;
        if (access_type == 1 && arm_feature(env, ARM_FEATURE_V6))
            env->cp15.c5_data |= (1 << 11);
        env->cp15.c6_data = address;
        env->exception_index = EXCP_DATA_ABORT;
    }
    return 1;
}
</code>
  - ''get_phys_addr'' (''target-arm/helper.c'')。<code c>
static inline int get_phys_addr(CPUARMState *env, uint32_t address,
                                int access_type, int is_user,
                                uint32_t *phys_ptr, int *prot,
                                target_ulong *page_size)
{
    /* Fast Context Switch Extension.  */
    if (address < 0x02000000)
        address += env->cp15.c13_fcse;

    if ((env->cp15.c1_sys & 1) == 0) {
        /* MMU/MPU disabled.  */
        *phys_ptr = address;
        *prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
        *page_size = TARGET_PAGE_SIZE;
        return 0;
    } else if (arm_feature(env, ARM_FEATURE_MPU)) {
        *page_size = TARGET_PAGE_SIZE;
        return get_phys_addr_mpu(env, address, access_type, is_user, phys_ptr,
                                 prot);
    } else if (env->cp15.c1_sys & (1 << 23)) {
        return get_phys_addr_v6(env, address, access_type, is_user, phys_ptr,
                                prot, page_size);
    } else {
        return get_phys_addr_v5(env, address, access_type, is_user, phys_ptr,
                                prot, page_size);
    }
}
</code>
  - ''get_phys_addr_v6'' (''target-arm/helper.c'')。[[http://infocenter.arm.com/help/topic/com.arm.doc.ddi0333h/DDI0333H_arm1176jzs_r0p7_trm.pdf|ARM1176JZ-S]] 6.11.2 ARMv6 page table translation subpage AP bits disabled 分別描述第一和第二層頁表項的格式。<code c>
static int get_phys_addr_v6(CPUARMState *env, uint32_t address, int access_type,
                            int is_user, uint32_t *phys_ptr, int *prot,
                            target_ulong *page_size)
{
    int code;
    uint32_t table;
    uint32_t desc;
    uint32_t xn;
    int type;
    int ap;
    int domain;
    int domain_prot;
    uint32_t phys_addr;

    /* Pagetable walk.  */
    /* Lookup l1 descriptor.  */
    table = get_level1_table_address(env, address);
    desc = ldl_phys(table);
    type = (desc & 3);
    if (type == 0) {
        /* Section translation fault.  */
        code = 5;
        domain = 0;
        goto do_fault;
    } else if (type == 2 && (desc & (1 << 18))) {
        /* Supersection.  */
        domain = 0;
    } else {
        /* Section or page.  */
        domain = (desc >> 5) & 0x0f; // 取 [8:5] 位。
    }
    domain_prot = (env->cp15.c3 >> (domain * 2)) & 3;
    if (domain_prot == 0 || domain_prot == 2) {
        if (type == 2)
            code = 9; /* Section domain fault.  */
        else
            code = 11; /* Page domain fault.  */
        goto do_fault;
    }
    if (type == 2) {
        if (desc & (1 << 18)) {
            /* Supersection.  */
            phys_addr = (desc & 0xff000000) | (address & 0x00ffffff);
            *page_size = 0x1000000;
        } else {
            /* Section.  */
            phys_addr = (desc & 0xfff00000) | (address & 0x000fffff);
            *page_size = 0x100000;
        }
        ap = ((desc >> 10) & 3) | ((desc >> 13) & 4);
        xn = desc & (1 << 4);
        code = 13;
    } else {
        /* Lookup l2 entry.  */
        table = (desc & 0xfffffc00) | ((address >> 10) & 0x3fc);
        desc = ldl_phys(table);
        ap = ((desc >> 4) & 3) | ((desc >> 7) & 4);
        switch (desc & 3) {
        case 0: /* Page translation fault.  */
            code = 7;
            goto do_fault;
        case 1: /* 64k page.  */
            phys_addr = (desc & 0xffff0000) | (address & 0xffff);
            xn = desc & (1 << 15);
            *page_size = 0x10000;
            break;
        case 2: case 3: /* 4k page.  */
            phys_addr = (desc & 0xfffff000) | (address & 0xfff);
            xn = desc & 1;
            *page_size = 0x1000;
            break;
        default:
            /* Never happens, but compiler isn't smart enough to tell.  */
            abort();
        }
        code = 15;
    }
    if (domain_prot == 3) {
        *prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
    } else {
        if (xn && access_type == 2)
            goto do_fault;

        /* The simplified model uses AP[0] as an access control bit.  */
        if ((env->cp15.c1_sys & (1 << 29)) && (ap & 1) == 0) {
            /* Access flag fault.  */
            code = (code == 15) ? 6 : 3;
            goto do_fault;
        }
        *prot = check_ap(env, ap, domain_prot, access_type, is_user);
        if (!*prot) {
            /* Access permission fault.  */
            goto do_fault;
        }
        if (!xn) {
            *prot |= PAGE_EXEC;
        }
    }
    *phys_ptr = phys_addr;
    return 0;
do_fault:
    return code | (domain << 4);
}
</code>
    * Translation fault 代表該虛擬位址並無映射，等同 x86 的 P 位清為零。
====== 外設 ======
  * [[wp>Advanced Microcontroller Bus Architecture|AMBA (Advanced Microcontroller Bus Architecture)]]
    * [[http://twins.ee.nctu.edu.tw/courses/embedlab_11/lecture/AMBA.pdf|Introduction of AMBA Bus System]]
    * APB (Advanced Peripheral Bus): 簡單和低速，針對週邊裝置。單一主裝置。
    * AHB (Advanced High-performance Bus): 複雜和流水線，針對內存。多個主從裝置，需要仲裁器 (arbiter)。
    * AXI (Advanced eXtensible Interface): 多對裝置可以進行點對點溝通。接口設計成不管底層以何種拓撲 (topology) 連接系統上的元件。


====== Semihosting ======
  * [[http://www.coocox.org/FAQ/documents/How%20to%20use%20semihosting%20to%20printf%20in%20CoIDE.pdf|How to use semihosting printf in CoIDE]]
    * [[http://wiki.csie.ncku.edu.tw/embedded/Lab7|Lab7: On-Chip Debugger + semihosting]]
    * printf 底層改調用 SH_SendChar，SH_SendChar 再調用 SH_DoCommand 發送 semihosting 要求給 ICE，ICE 再將字符傳送至主機端 (PC)。

  * [[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0471g/Bgbjjgij.html|What is semihosting?]]
    * <q>Semihosting is a mechanism that enables code running on an ARM target to communicate and use the Input/Output facilities on a host computer that is running a debugger.</q>
    * 目標平台透過除錯器連接至 PC 端 (host) 做輸入和輸出，故稱 semihosting (半主機)。
====== CMSIS ======
  * [[http://www.keil.com/pack/doc/CMSIS/Core/html/index.html|CMSIS-CORE]]{{ http://www.keil.com/pack/doc/CMSIS/Core/html/CMSIS_CORE_Files.png }}
    * 硬體廠商應該實做[[http://www.keil.com/pack/doc/CMSIS/Core/html/_templates_pg.html|藍色部分]]。參考 CMSIS\Pack\Example\Device 實做 Device\_Template_Vendor\Vendor\Device。紫色部分已於 CMSIS\Include 提供，無需實做。
  * [[http://www.keil.com/pack/doc/CMSIS/Driver/html/index.html|CMSIS-Driver]]{{ http://www.keil.com/pack/doc/CMSIS/Driver/html/Driver.png }}
    * 參考 CMSIS\Pack\Example\CMSIS_Driver 實做 CMSIS\Driver，以提供 Device Pack。
  * [[http://www.keil.com/pack/doc/CMSIS/DSP/html/index.html|CMSIS-DSP]]
    * [[http://mcuoneclipse.com/2013/02/14/tutorial-using-the-arm-cmsis-library/|Tutorial: Using the ARM CMSIS Library]]
    * CMSIS\Lib 已有預先編譯好的函式庫。
    * [[http://kernelhacks.blogspot.tw/2013/01/cmsis-dsp-software-library.html|CMSIS DSP Software Library]]
      * [[http://forum.stellarisiti.com/topic/333-anyone-building-cmsis-under-linux/|Anyone building CMSIS under Linux?]] 
      - 下載 Makefile。<code bash>
$ cd ~/src/stellaris/cmsis-src/CMSIS
$ wget http://pastebin.com/raw.php?i=613Lz661 -O Makefile.inc
$ wget http://pastebin.com/raw.php?i=ESqnApg8 -O Makefile
$ cd DSP_Lib/Source
$ wget http://pastebin.com/raw.php?i=82VTqR4F -O Makefile
$ cp Makefile BasicMathFunctions
$ cp Makefile CommonTables
$ cp Makefile ComplexMathFunctions
$ cp Makefile ControllerFunctions
$ cp Makefile FastMathFunctions
$ cp Makefile FilteringFunctions
$ cp Makefile MatrixFunctions
$ cp Makefile StatisticsFunctions
$ cp Makefile SupportFunctions
$ mv Makefile TransformFunctions
</code>
      - 執行 make。<code bash>
$ cd ~/src/stellaris/cmsis-src/CMSIS
$ make
</code>
  * [[http://www.keil.com/pack/doc/CMSIS/RTOS/html/index.html|CMSIS-RTOS]]
  * [[http://www.keil.com/pack/doc/CMSIS/Pack/html/index.html|CMSIS-Pack]]
    * 針對軟件包、裝置或是開發版打包。
  * [[http://www.keil.com/pack/doc/CMSIS/Pack/html/index.html|CMSIS-SVD]]
====== 其它 ======
以 softfp 編譯的執行檔無法在 hardfp 的平台上運行 [(http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-April/048902.html)]。

  * [[https://launchpad.net/linaro-toolchain-binaries/+download|Linaro Toolchain Binaries]]
    * 有支援 hardfp 的工具鏈。
  * [[http://wiki.debian.org/ArmHardFloatPort/VfpComparison]]
  * [[http://blog.csdn.net/yuanyou/article/details/6410326|ARM GCC浮点相关总结]]

<blockquote>
-mfloat-abi=name

Specifies which floating-point ABI to use. Permissible values are: `soft', `softfp' and `hard'. 

Specifying `soft' causes GCC to generate output containing library calls for floating-point operations.
`softfp' allows the generation of code using hardware floating-point instructions, but still <color red>uses the soft-float calling conventions.</color> `hard' allows generation of floating-point instructions and <color red>uses FPU-specific calling conventions.</color> 

The default depends on the specific target configuration. <color red>Note that the hard-float and soft-float ABIs are not link-compatible; you must compile your entire program with the same ABI, and link with a compatible set of libraries.</color>
<cite>http://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html</cite>
</blockquote>

  * [[wp>ARMulator]]
====== 術語 ======
  * ARM ® Architecture Reference Manual (ARMARM)
  * [[http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042d/IHI0042D_aapcs.pdf|Procedure Call Standard for the ARM® Architecture (AAPCS)]]
    * 與調用約定有關。ARM 舊有的調用約定稱 APCS (ARM Procedure Call Standard)。 
    * [[http://stackoverflow.com/questions/10494848/arm-whats-the-difference-between-apcs-and-aapcs-abi|What's the difference between APCS and AAPCS ABI?]]
    * ATPCS (ARM-Thumb Procedure Call Standard) 
  * Protected Memory System Architecture (PMSA)
    * 與 MPU 相關。 
  * Virtual Memory System Architecture (VMSA)
    * 與 MMU 相關。 
  * TTB (Translation Table Base address)
  * Fast Context Switch Extension (FCSE)
  * SBZ (Should Be Zero)
  * SBO (Should Be One)
  * SBZP (Should Be Zero or Preserved)
  * Large Physical Address Extensions (LPAE)
  * Generic Interrupt Controller (GIC)
  * Cache Coherent Interconnect (CCI)
  * CPSR (Current Program Status Register)


====== 文章 ======
  * [[http://lwn.net/Articles/356790/|Papers from the Real Time Linux Workshop]]
  * [[http://www.techbang.com/posts/10678-fully-understand-arm-processors-cisc-and-risc-are-what-history-structure-a-see-through-the-computer-96-issues-cover-story-the-king|完全看懂 ARM 處理器：RISC 與 CISC 是什麼？歷史、架構一次看透]]
====== 系統軟體 ======
===== GCC =====
    * [[https://launchpad.net/gcc-arm-embedded|GCC ARM Embedded]] 使用 newlib。
    * [[https://answers.launchpad.net/gcc-arm-embedded/+question/247726|libnosys.a, the FPU, and missing _exit/_kill etc]]
      * [[http://stackoverflow.com/questions/19419782/exit-c-text0x18-undefined-reference-to-exit-when-using-arm-none-eabi-gcc|exit.c:(.text+0x18): undefined reference to `_exit' when using arm-none-eabi-gcc]]
      * [[https://gcc.gnu.org/onlinedocs/gcc/Directory-Options.html|3.14 Options for Directory Search]]
        * ''-specs=nosys.specs'' 會額外鏈結到 libnosys.a，其中包含所需的 symbol。
          * GNU Tools ARM Embedded\4.9 2015q2\share\doc\gcc-arm-none-eabi\readme.txt
          * GNU Tools ARM Embedded\4.9 2015q2\share\gcc-arm-none-eabi\samples\readme.txt
    * [[http://stackoverflow.com/questions/4925012/can-i-get-a-report-of-all-the-libraries-linked-when-building-my-c-executable|Can I get a report of ALL the libraries linked when building my C++ executable (gcc)? (including statically linked)]]
      * ''-Wl,--trace'' 可以得知鏈結到哪些函式庫。
    * [[http://pabigot.github.io/bspacm/newlib.html|Interfacing with newlib (libc)]]
      * 目前使用 newlib。  
    * [[http://processors.wiki.ti.com/index.php/Using_GCC_with_Tiva_in_CCSv6#Working_with_GCC_libraries|Working with GCC libraries]]
      * [[https://www.tablix.org/~avian/blog/archives/2013/04/multilib_weirdness/|Multilib weirdness]]
      * [[https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html|3.9 Options for Debugging Your Program or GCC]]
      * 一個函式庫源代碼，經過多種編譯器選項組合，會編譯出不同版本的函式庫。
      * 透過 ''-print-multi-lib'' 可以知道各版本的函式庫和編譯器選項之間的對映關係。
      * 透過 ''-print-multi-directory'' 可以知道編譯器選項使用哪一個版本的函式庫 (其所在目錄)。
  * [[https://sourceware.org/ml/newlib/2008/msg00298.html|_sbrk: undefined reference to `end' problem]]
    * [[http://mcuoneclipse.com/2015/05/30/problem-undefined-reference-to-__end__-if-using-semihosting/|Problem: undefined reference to ‘__end__’ if using Semihosting]]
    * 符號 end 定義在鏈結腳本之中，指向使用者 RAM 空間的尾端。sbrk 會以 end 符號做為堆的起始位址，故會參考到 end 符號。
  * [[https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html|6.31 Declaring Attributes of Functions]]
    * 函式屬性寫在函式宣告 (定義也算是宣告) 之後。
    * [[https://gcc.gnu.org/onlinedocs/gcc/ARM-Function-Attributes.html|6.31.4 ARM Function Attributes]]
      * naked 屬性限定於特定平台，ARM 是其中之一。naked 屬性只適用於函式中包含 [[https://gcc.gnu.org/onlinedocs/gcc/Basic-Asm.html|Basic Asm]] 的情況。<code bash>
$ cat foo.c
//void foo() __attribute__( (naked) );

void foo()
{
  __asm__ ("nop;");
}
$ arm-none-eabi-gcc.exe -mcpu=cortex-m4 -mthumb foo.c -S
</code><code asm>
foo:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 1, uses_anonymous_args = 0
        @ link register save eliminated.
        push    {r7}
        add     r7, sp, #0
@ 5 "foo.c" 1
        nop;
@ 0 "" 2
        .thumb
        mov     sp, r7
        @ sp needed
        ldr     r7, [sp], #4
        bx      lr
</code>
      * thumb 模式底下，r7 作為 frame pointer[(http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-October/066475.html)]。上述匯編用於保存並更新 frame pointer。
  * [[http://blog.csdn.net/xiruanliuwei/article/details/9823851|编译C/C++语言程序源码生成的汇编语言程序源码中的.fnstart，.fnend等伪操作]]
      * [[http://blog.csdn.net/xiruanliuwei/article/details/9824163|.fnstart，.fnend等伪操作对assembler的影响？]]
      * [[https://sourceware.org/binutils/docs/as/ARM-Unwinding-Tutorial.html|9.4.7 Unwinding]]
        * .fnstart 和 .fnend 是為了例外處理而產生的匯編指示符。匯編中，如果調用到會產生例外的函式，必須手動加上 .fnstart 和 .fnend，以便運行時函式庫能正確處理[[https://msdn.microsoft.com/zh-tw/library/hh254939.aspx|堆疊回溯]] (stack unwinding)。
        * .fnstart 和 .fnend 必需要匹配。舊版 GCC 似乎不會對此做嚴格檢查 ([[https://github.com/llvm-mirror/llvm/blob/master/test/MC/ARM/eh-directive-fnstart-diagnostics.s|eh-directive-fnstart-diagnostics.s]])。
===== 其它 =====
  * [[wp>Das U-Boot|U-Boot]]
    * [[http://www.crifan.com/files/doc/docbook/uboot_starts_analysis/release/html/uboot_starts_analysis.html|Uboot中start.S源码的指令级的详尽解析]]
    * [[http://www.simtec.co.uk/products/SWLINUX/files/booting_article.html|Booting ARM Linux]]
    * 啟動程序與目標處理器和平台 (即包含處理器和週邊控制器的版子) 相關。
    * 啟動程序主要工作為設定環境和載入作業系統。
  * [[wp>Micro-Controller Operating Systems (MicroC/OS)|µC/OS-II]]
    * [[https://doc.micrium.com/display/welcome/Welcome|Micrium documentation]] 
    * 使用者的程序主動調用系統提供的服務。
  * [[wp>FreeRTOS]]
    * [[http://wiki.csie.ncku.edu.tw/embedded/freertos|FreeRTOS]]
====== 虛擬化 ======
支援 Coretx-A15 的板子皆有支援虛擬化[(http://people.cs.nctu.edu.tw/~chenwj/log/QEMU/agraf-2012-05-30.txt)]。
  * [[http://apsys11.ucsd.edu/papers/apsys11-varanasi.pdf|Hardware-Supported Virtualization on ARM]]
  * [[https://sites.google.com/a/sslab.cs.nthu.edu.tw/armvisor/|ARMVisor]]
    * [[https://github.com/SSLab-NTHU/linux-host-armvisor|ARMVisor]]
    * [[http://www.slideshare.net/PeterChang6/armvisor-more-details|ARMVisor]] 
====== 匯編語言 ======
  * [[http://people.cs.umass.edu/~trekp/cs201/|Introduction to computer organization and ARM assembly programming]]
  * [[http://www.heyrick.co.uk/assembler/|ARM ASSEMBLER]]
====== 編程優化 ======
要寫出有效率的 C 代碼，必須要知道編譯器於優化上有何侷限，目標處理器架構上的限制，以及特定編譯器本身的限制。

  * 基本 C 資料型別
    * 區域變數型別: 儘可能採用 int 型別，因為 ARM 暫存器長度為 32 位，運算大部分也針對 32 位。
    * 函數實參型別: 編譯器決定在調用方或是被調用方，依據形別調整變數大小。儘可能採用 int 型別。
    * 有號和無號型別: 於除法運算，建議使用 unsigned int。
  * C 迴圈結構
    * 固定次數迭代: 由於 ARM 提供條件執行指令，採用 count down to zero。
    * 不定次數迭代: 改用 do-while。
    * 迴圈展開
  * 暫存器配置: 限制區域變數的個數，避免被配置在棧上。於最內層迴圈被使用到的變數，會被儘可能的被分配到暫存器上。
  * 函式呼叫: 傳入參數限制在 4 個以下，避免透過棧傳遞參數。
  * 指針別名 (aliasing)
  * 結構佈局
  * 對齊和大小端
    * [[http://www.alexonlinux.com/aligned-vs-unaligned-memory-access|Aligned vs. unaligned memory access]]
    * [[https://www.kernel.org/doc/Documentation/unaligned-memory-access.txt|UNALIGNED MEMORY ACCESSES]]
    * [[http://stackoverflow.com/questions/12491578/whats-the-actual-effect-of-successful-unaligned-accesses-on-x86|What's the actual effect of successful unaligned accesses on x86?]]
  * 除法
  * 浮點運算
  * 內聯函式和內聯匯編

  * [[http://www.davespace.co.uk/arm/efficient-c-for-arm/|Efficient C for ARM]]
  * [[http://www.arm.com/files/pdf/AT_-_Better_C_Code_for_ARM_Devices.pdf|Efficient C Code for ARM Devices]]
====== 參考書籍 ======
  * [[http://www.davespace.co.uk/arm/introduction-to-arm/|ARM: Introduction to ARM]]
  * [[http://www.davespace.co.uk/arm/efficient-c-for-arm/|ARM: Efficient C for ARM]]
  * [[http://www.waterlike.com.tw/bookdata.asp?NO=TP3C113017|一步步寫嵌入式作業系統：ARM編程的方法與實踐]]
    * [[http://www.leeos.org/|《一步步写嵌入式操作系统》资源]] 
  * [[http://www.amazon.com/ARM-System-Developers-Guide-Architecture/dp/1558608745|ARM System Developer's Guide: Designing and Optimizing System Software]]
    * [[http://booksite.elsevier.com/9781558608740/|書中代碼]]
====== 外部連結 ======
  * [[http://infocenter.arm.com/help/index.jsp|ARM Infocent
er]] - yahoo，開頭大寫 H，長
     * ARM ®  Architecture Reference Manual (ARM ARM)
  * [[http://www.arm.linux.org.uk/|The ARM Linux Project]]
  * [[wp>ARM architecture]]
    * [[http://blog.csdn.net/muxiqingyang/article/details/6635934|处理器的分层模型——从MIPS、龙芯、ST的关系谈起]] 
    * [[http://loda.hala01.com/2011/02/arm%E8%88%87cortex%E7%AD%86%E8%A8%98/|ARM與Cortex筆記]]
    * [[http://loda.hala01.com/2011/06/arm%e8%88%87cortex%e7%ad%86%e8%a8%98-arm-mpcore-multi-processor-core-%e6%9e%b6%e6%a7%8b%e8%a7%a3%e6%9e%90/|ARM MPCore (Multi-Processor Core) 多核心架構解析]]
  * [[http://wiki.csie.ncku.edu.tw/embedded/schedule|進階嵌入式系統開發與實作 (Fall 2012)]]