<note>
  * 紀錄一般知識，盡量不針對特定編譯器或硬體。可以舉例。
  * 課程:
    * [[http://web.stanford.edu/class/cs143/|CS143: Compilers]]
    * [[https://www.cs.cmu.edu/~15745/|15-745 Optimizing Compilers for Modern Architectures]]
</note>
如何能深刻了解編譯器?
  * [[https://www.amazon.com/Language-Implementation-Patterns-Domain-Specific-Programming/dp/193435645X/ref=pd_sbs_14_t_1?_encoding=UTF8&psc=1&refRID=R8WCCTA0AZH0AY4JJD27|Language Implementation Patterns]]
    * 前端，直譯器相關。
  * [[wp>Compilers: Principles, Techniques, and Tools]]
    * 優化相關。前端不推薦。
  * [[http://www.amazon.com/Engineering-Compiler-Second-Keith-Cooper/dp/012088478X|Engineering a Compiler, Second Edition]] - 入門
  * [[http://www.amazon.com/Advanced-Compiler-Design-Implementation-Muchnick/dp/1558603204|Advanced Compiler Design and Implementation]] - 進階
  * 寫一個編譯器或是參與現有編譯器的開發。
有用連結:
  * [[http://compilers.iecc.com/index.phtml|comp.compilers newsgroup]]
  * [[http://coolshell.cn/articles/1547.html|使用Flex Bison 和LLVM编写自己的编译器]]
  * [[http://www.codeopt.com/|Search: Compilers / Optimization]]

  * [[http://llvm.org/devmtg/2012-04-12/Slides/Workshops/David_Chisnall.pdf|What LLVM can do for you]]<code bash>
$ wget http://llvm.org/devmtg/2012-04-12/Slides/Workshops/memo.cc
$ wget http://llvm.org/devmtg/2012-04-12/Slides/Workshops/examples.tbz2
</code>

====== 編譯流程 ======
  * [[wp>Compiler]]
    * 源代碼 (Source Code) -> 語彙分析 (Lexical Analysis) -> 語法分析 (Syntax Analysis) -> 語意分析 (Semantic Analysis) -> 生成中間表示式 (IR Generation) -> 優化中間表示式 (IR Optimization) -> 生成目標代碼 (Code Generation) -> 優化目標代碼 (Optimization) -> 目標代碼 (Machine Code)
===== 前端 =====
  * [[https://www.amazon.com/Language-Implementation-Patterns-Domain-Specific-Programming/dp/193435645X|Language Implementation Patterns]]
  * [[Lex & Yacc]]
    * [[http://avnpc.com/pages/devlang|《自制编程语言》相关资料]]
    * [[https://github.com/azru0512/win_sjis|win_sjis]]
      * 重點著重在如何建立 AST。前端會遍歷 AST 多次，每次遍歷求得的資訊會記在 AST 節點上, 供下一次遍歷使用。
  * [[http://yinwang0.lofter.com/post/183ec2_47c086|怎样写一个解释器 - 王垠]]
  * [[wp>Parsing]]
    * <q>The grammar cannot remember the presence of a construct over an arbitrarily long input; this is necessary for a language in which, for example, a name must be declared before it may be referenced. More powerful grammars that can express this constraint, however, cannot be parsed efficiently. Thus, it is a common strategy to create a relaxed parser for a context-free grammar which accepts a superset of the desired language constructs (that is, it accepts some invalid constructs); later, the unwanted constructs can be filtered out at the semantic analysis (contextual analysis) step.</q> 
  * [[http://stackoverflow.com/questions/1078908/what-is-the-difference-between-keyword-and-reserved-word|What is the difference between “keyword” and “reserved word”?]]
    * [[wp>Reserved word]]
      * <q>In a computer language, a reserved word (also known as a reserved identifier) is a word that cannot be used as an identifier, such as a name of a variable or function or a label – it is "reserved from use". This is a syntactic definition, and a reserved word may have no meaning. This is a syntactic definition, and a reserved word may have no meaning. A closely related and often conflated notion is a keyword which is a word with special meaning; this is a semantic definition.</q>
  * [[http://stackoverflow.com/questions/3846727/why-is-the-difference-between-an-expression-and-a-statement|Why is the difference between an expression and a statement]]
    * [[wp>Expression (computer_science)|Expression]]
    * [[wp>Statement (computer science)|Statement]]
    * 表達式 (expression) 和述句 (statement) 主要差別在於前者產生値，而後者產生動作。後者可藉由前者產生的值，進行賦値或是流程控制。

  * [[JTC1/SC22/WG14 - C]]
    * 閱讀標準，理解用詞。
    * [[SDCC]]
  * [[JTC1/SC22/WG21 - C++]]

  * [[wp>Data type]]
      * [[wp>Primitive data type]]
        * 基本型別。下述衍生型別基本上都是以基本型別建構而成。
      * [[wp>Composite data type]]
        * 複合型別。
      * [[wp>Reference type]]
        * 引用類型。Java 裡面除基本型別 ([[wp>Primitive data type]]) 之外，皆為引用類型。
  * [[wp>Dangling else]]
    * else 如何與 if 配對。

  * [[http://blog.robertelder.org/building-a-c-compiler-type-system-the-formidable-declarator/|Building A C Compiler Type System - Part 1: The Formidable Declarator]]
  * [[http://blog.robertelder.org/building-a-c-compiler-type-system-a-canonical-type-representation/|Building A C Compiler Type System - Part 2: A Canonical Type Representation]]
===== 產生中間語言 =====
請見 [[http://www.stanford.edu/class/cs143/lectures/130_TAC.pdf|Three-Address Code IR]]。

對於中間語言常用的格式，[[wp>Static single assignment form|Static single assignment form (SSA)]]，只需要知道底下幾點:
  * 每個變量只被賦值一次 (類似函數式編程中 immutable 的概念)。
  * 一般程序轉換成 SSA form 時，會針對控制流插入 Phi Node。
    * Phi Node 在 LLVM 中被實現為一條 LLVM IR ([[http://stackoverflow.com/questions/11485531/what-exactly-phi-instruction-does-and-how-to-use-it-in-llvm|What exactly PHI instruction does and how to use it in LLVM]])。
  * SSA form 簡化後續優化所需實現。

延伸閱讀:
  * [[http://ssabook.gforge.inria.fr/latest/book.pdf|SSA Book]]
===== 優化中間語言 =====
請見 [[http://www.stanford.edu/class/cs143/lectures/140_IR_Optimization.pdf|IR Optimization]]、[[http://www.stanford.edu/class/cs143/lectures/150_Global_Optimization.pdf|Global 
Optimization]] 和 [[http://www.stanford.edu/class/cs143/lectures/160_Global_Optimization_2.pdf|Global Optimization, Part II]]。
===== 生成目標代碼 =====
Code Generation 包含以下步驟:
  - [[wp>Instruction selection]]
  - [[wp>Register allocation]]
  - [[wp>Instruction scheduling]] 
可以參考 LLVM 的流程 [[http://www.llvm.org/docs/CodeGenerator.html#high-level-design|The high-level design of the code generator]]。

  * [[http://staff.science.uva.nl/~andy/compiler/opt+backend.pdf|Introduction to Compiler Design: optimization and backend issues]]
==== 指令選擇 ====
Instruction selection 將 IR 轉換成 machine code，或是低階的 IR (接近目標機器碼)，但仍使用 pseudo register，留待之後的 register allocation 將目標機器的暫存器分配給 pseudo register。Instruction selection 一般是將 machine code 模板貼至 IR tree 上，稱為 tiling。一般來說，會將 IR tree 轉成 [[wp>Directed acyclic graph|DAG]] (directed acyclic graph)。Tree 和 DAG 不同之處在於，節點在 tree 中只有一個父節點，而在 DAG 中可以有多個父節點。DAG 這個特性用來將 common subexpression 暴露出來。若一個節點為 common subexpression，在 tree 中會出現多次，而在 DAG 只會出現一次。DAG 有助於後續的 instruction selection 和 register allocation。

見 [[http://www.amazon.com/gp/product/0321486811/ref=pd_lpo_k2_dp_sr_1?pf_rd_p=1278548962&pf_rd_s=lpo-top-stripe-1&pf_rd_t=201&pf_rd_i=0441017649&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=02WFWGQTZ4SV9K8QQM5C|Compilers: Principles, Techniques, and Tools]] 第 8.9 節: Instruction Selection by Tree Rewriting。

  * [[http://www.inf.uni-konstanz.de/dbis/teaching/ws0203/compiler-construction/download/comp-08.pdf|Instruction Selection]]
  * [[http://www.lrde.epita.fr/dload/20040602-Seminar/vasseur0604_code-gen-gen_report.pdf|Code-generator generators: generating the instruction selector]]
    * Optimal code selection in DAGs 
  * [[http://cs.tju.edu.cn/faculty/hujing/%E4%B8%80%E4%BA%9B%E8%99%8E%E4%B9%A6%E7%9A%84%E5%8F%82%E8%80%83/PDF&PPT/Advanced%20topics%20More%20Instruction%20Selection.pdf|More Instruction Selection]]
  * Compilers: Principles, Techniques, and Tools - 8.9

LLVM 使用的是 SelectionDAG based instruction selector。利用 tblgen 生成 instruction selector。
  * Scheduling expression DAGs for minimal register need
  * Optimal Integrated Code Generation for Clustered VLIW Architectures
  * [[wp>Sethi–Ullman algorithm]]
    * [[http://comments.gmane.org/gmane.comp.compilers.llvm.devel/46730|Sethi-Ullman Scheduling]]
    * lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp ([[http://www.llvm.org/docs/CodeGenerator.html#selectiondag_sched|SelectionDAG Scheduling and Formation Phase]])
==== 暫存器配置 ====
兩個述句 (statement) 之間的點被稱為 program point。一個變數被 define，代表該變數被寫入值; 一個變數被 use，代表該變數的值被讀出。針對變數 a 而言，<code>
---------- program point
  a = 1; // define
----------
  b = a; // use
----------
</code>
一個變數被稱為 live，代表該變數其值在將來會被使用到，亦即在某個時間點被讀出。一個變數的 live range 是一個 program point 的集合，該變數在這些 program ponit 為 live。一個變數的 live interval 是一個最小區間，該區間包含該變數所有的 live range。live interval 相較於 live range 較不精確，但是其 register allocation 較為簡單。[[http://www.stanford.edu/class/cs143/lectures/170_Register_Allocation.pdf|Register Allocation]] 第 64 頁演示的 linear scan 算法就是針對各個變數的 live interval 配置暫存器。編譯器在做 register allocation 時，會視情況針對某變數做 live range splitting，其目的是希望在 live range splitting 之後，能減少各變數之間 live range 重疊的情況。如果兩個變數的 live range 有所重疊，則代表這兩個變數無法分配到同一個暫存器，這會增加暫存器使用量。

[[wp>Register allocation]] 包含兩個步驟: 首先，register allocation 釐清哪些變數將可存入暫存器; 接著，register assignment 將物理暫存器分配給那些變數。如果途中發現暫存器不敷使用，register allocation 演算法會選擇將某些暫存器存入內存，這個動作稱之為 spill。register allocation 演算法目標之一就是要降低 spill 的次數。一般來說，register allocation 演算法會選擇把擁有較長 live interval 的變數 spill 到內存。這是因為如果將暫存器分配給該變數，則該暫存器將有較長時間不能再被使用。所以，一般會將暫存器優先分配給 live interval 較短的變數。有時候，如果某暫存器其中的值可以在之後簡單的計算出來，算法可以選擇 spill 該暫存器，且之後不須從內存將其值讀出，只需要重新計算即可，這稱之為 [[wp>Rematerialization|rematerialization]]。

相較於 linear scan 是利用 live interval 此一較不精確的描述來分配暫存器，[[http://www.stanford.edu/class/cs143/lectures/170_Register_Allocation.pdf|Register Allocation]] 第 109 頁演示的 graph coloring 算法利用 live range 分配暫存器。每個變數代表一個節點。如果變數 A 和變數 B 的 live range 有所重疊，則節點 A 和 B 之間有連線。針對各個變數的 live range 所建立的圖稱為 register interference graph (RIG)。此算法目的是要替 RIG 上的節點著色 (代表暫存器)，有連線的節點必須著以不同的顏色。

根據 register allocation 的範圍，可分為 local register allocation 和 global register allocation。前者針對一個 basic block 分配暫存器給各個變數，後者則針對數個 basic block。

LLVM 3.0 預計以 greedy register allocation 取代現行的 linear scan 算法。我想你現在具有足夠知識開始閱讀文章 [[http://blog.llvm.org/2011/09/greedy-register-allocation-in-llvm-30.html|Greedy Register Allocation in LLVM 3.0]] 了。:-) LLVM 3.0 有兩個 register allocator，分別是 basic allocator 和 greedy allocator。兩者都是根據變數的 live range 分配暫存器。它們都有兩個重要的資料結構: priority queue 和 live interval unions。Priority queue 以變數的 live range 的 spill weight 由高到低排序，將暫存器優先配給 spill 代價較高的變數。Live interval unions 存放的是已經分配到暫存器的 live range，目的是用來檢查某暫存器分配給那些 live range。底下略為描述 basic allocator 和 greedy allocator 的運作方式。

  - Basic allocator: 由 priority queue 中挑出 spill 代價最高的 live range，在此稱之為 live range A，試圖配置暫存器給它。如果成功，將配置結果更新到 live interval unions; 否則，則 spill live range A。這裡不需要檢查 live interval unions 選擇 spill 其它 live range，因為我們優先配置暫存器給 spill 代價較高的 live range。一般是透過插入 load/store 指令 spill 某個 live range，這會造出新的 live range (在 load 跟 store 之間)。這一個因 spill 造出的 live range 會塞入 priority queue 等待下一次分配，其 spill weight 設為無限大。等到下一次為該 spill weight 為無限大的 live range A 分配暫存器時，basic allocator 會選擇一暫存器，並檢視 live interval unions 看該暫存器分配給那些 live range，亦即與那些 live range 有 interfering。Basic allocator spill 那些倒楣的 live range，再將該暫存器配給該 live range A。
  - Greedy allocator: 與 basic allocator 的差別在，當沒有適當的暫存器可供分配給 live range A，greedy allocator 會由 live interval unions 中挑出 spill weight 較小的 live range B 並 spill 它 (稱為 evict)。Live range B 會被塞回 priority queue 等待下一次分配。注意! 在 basic allocator 中，不會挑 live interval unions 中 spill weight 較小的 live range 做 spill。此外，geedy allocator 會在 spill live range 之前先試著做 live range splitting，將 live range 分成較小塊的 live range，再將它們塞回 priority queue。

  * [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-September/043322.html|[LLVMdev] Greedy Register Allocation in LLVM 3.0]]
    * [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-December/046073.html|[LLVMdev] Greedy Register Allocation in LLVM 3.0]]
  * [[http://llvm.org/devmtg/2011-11/Olesen_RegisterAllocation.pdf|Register Allocation in LLVM 3.0]]
  * [[http://compilers.cs.ucla.edu/fernando/publications/drafts/survey.pdf|A Survey on Register Allocation]]
  * [[http://www.cs.cmu.edu/~dkoes/research/dkoes_thesis.pdf|Towards a More Principled Compiler: Register Allocation and Instruction Selection Revisited]]
  * [[wp>Strahler number]]
  * [[wp>Ershove Number]]
==== 指令排程 ====
請見 [[http://www.stanford.edu/class/cs143/lectures/190_Code_Optimization.pdf|Code Optimization]]。

Instruction scheduling 和 register allocation 的順序如何調整及其優缺點請見 [[wp>Instruction_scheduling#The_phase_order_of_Instruction_Scheduling|The phase order of Instruction Scheduling]]。
  * instruction scheduling 目的在於提高 ILP。
  * instruction scheduling 在 register allocation 之前的優點在於 ILP 能最大化。因為在 register allocation 之前，使用的是虛擬暫存器 (數量無限)，多條指令可以並行; 但這也是缺點，ILP 最大化之後，會對 register allocation 造成壓力，多條指令執行可能會使用超過物理暫存器數量的暫存器，register allocation 必須做 spill。
  * instruction scheduling 在 register allocation 之後的優缺點和上面剛好相反。
  * LLVM 在 register allocation 前後都會做 instruction scheduling。
====== Optimization ======
  * [[http://www.valleytalk.org/wp-content/uploads/2011/11/%E7%A8%8B%E5%BA%8F%E5%88%86%E6%9E%90-%E5%8E%9F%E7%90%86%E9%83%A8%E5%88%86%E5%85%A8.pdf|程序分析-原理部分(全)]]
    * [[wp>Dominator|Dominator/Post-dominator]]
    * [[wp>Dependence_analysis#Control_dependencies|Control Dependence]]


  * [[http://www.freescale.com/infocenter/Codewarrior/index.jsp?topic=/com.freescale.doc.mcu.coldfire_compiler/220_opt.Live_Range_Splitting.html|Intermediate Optimizations ]]
  * [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-October/078222.html|[LLVMdev] Switch instruction lowering]]
    * [[wp>Multiway branch]]
      * [[http://llvm.org/pubs/2007-05-31-Switch-Lowering.pdf|Improving Switch Lowering for The LLVM Compiler System]]
      * A Superoptimizer Analysis of Multiway Branch Code Generation
====== Link Time Optimization======
  * [[wp>Link-time optimization]]
====== Exceptio Handling ======
目前所見的例外處理其實作上基本遵循 [[http://www.codesourcery.com/cxx-abi/abi-eh.html|Itanium C++ ABI: Exception Handling]]。它實現所謂的 zero-cost exception handling，也就是例外處理不會造成程式正常執行流程效能上的影響。[(http://stackoverflow.com/questions/307610/how-do-exceptions-work-behind-the-scenes-in-c#307716)]

personality function/routine 是語言執行時期函示庫中的函式，用來作為 system unwind library 和 language-specific exception handling semantics 的介面。這是因為不同的程式語言對於例外處理有不同的行為，personality function 接受例外的內文，包含其型別和值。 

  * [[http://luse.blogspot.tw/2009/05/c.html|淺談C++例外處理 (前篇)]]
  * [[http://luse.blogspot.tw/2009/10/c.html|淺談C++例外處理 (中篇)]]
  * [[http://luse.blogspot.tw/2010/01/c.html|淺談C++例外處理 (後篇)]]
===== LLVM =====
LLVM 3.0 針對例外處理主要提供以下 LLVM IR/Intrinsics 支援。前端在處理程式語言中的例外處理時，可以將其對映到 LLVM 中用來支援例外處理的指令，再交由之後的優化器和後端處理。底下將會以 C++ 中例外處理的 try/catch/throw 當作對映。
  - invoke: 針對在 try block 中的函式呼叫，改用 invoke 而非 call。如果在一個會丟出例外的函式中呼叫另一個函式，同樣也是用 invoke。
  - landingpad: 例外發生之後，會跳轉到一個特殊的基本塊。在這個基本塊中，landingpad 用來準備之後例外處理需要的資料。
  - resume
  - llvm.eh.typeid.for

以 [[http://llvm.org/devmtg/2011-09-16/EuroLLVM2011-ExceptionHandling.pdf|The new LLVM exception handling scheme]] 說明 LLVM 如何實現例外處理。

<code c++>
try {
   ...
   MayThrowSomething();
   AnotherFunctionCall();
   ...
} catch (int i) {
  DoSomethingWithInt(i);
} catch (class A a) {
  DoSomethingWithA(a);
}
</code>
其 try block 對映的 LLVM IR 為，<code>
invoke void @_Z17MayThrowSomethingv() to label %cont unwind label %lpad
</code>
我們可以看到在 try block 中的函式呼叫改用 invoke，其分支目標有二: label %cont 和 unwind label %lpad，分別為正常執行流程和例外處理流程。
<code>
lpad:
  %info = landingpad { i8*, i32 } personality @__gxx_personality_v0 catch @_ZTIi catch @_ZTl1A
  %except = extractvalue { i8*, i32 } %info, 0
  %selector = extractvalue { i8*, i32 } %info, 1
  %typeid = call i32 @llvm.eh.typeid.for(@_ZTIi)
  %match = icmp eq i32 %selector, %typeid
  br i1 %match, label %run_catch, label %try_next
</code>
landingpad 是用來準備例外資訊給後面的例外處理程序使用。landing pad (lpad) 中文意思是停機坪，這代表當例外發生時，執行流程會跳轉到這裡，再轉發到其它地方。%info 是該例外的資訊，{ i8*, i32 } 是其型別。其中 i8* 是 exception pointer，i32 是 selector。personality @__gxx_personality_v0 代表目前處理的是 C\+\+ 的例外，它知道如何將該例外與 catch clause 作 C\+\+ 型別比較。catch @_ZTIi 和 catch @_ZTl1A 則是列出所有的 catch clause，@_ZTIi 和 @_ZTl1A 是語言特定的全域變數，這裡用來代表 catch clause 的 C\+\+ 型別資訊。


  * [[http://www.llvm.org/docs/ExceptionHandling.html|Exception Handling in LLVM]]
  * [[http://www.llvm.org/docs/LangRef.html|LLVM Language Reference Manual]]
  * [[http://blog.llvm.org/2011/11/llvm-30-exception-handling-redesign.html|LLVM 3.0 Exception Handling Redesign]]
  * [[http://www.cplusplus.com/doc/tutorial/exceptions/]]
  * [[http://stackoverflow.com/questions/329059/what-is-gxx-personality-v0-for]]
  * [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-July/041748.html]]
  * [[Binutils]]
  * [[Exception handling]]


====== 術語 ======
  * [[wp>Clobbering]][(http://www.cs.nctu.edu.tw/~chenwj/log/IM/zakk-clobber-2011-06-28.txt)]
    * clobbered register 代表該條指令執行時，其內容會被修改的暫存器。
  * Register scavenging [(http://lists.llvm.org/pipermail/llvm-dev/2017-April/111696.html)]
    * 在 register allocation 之後的 pass，如果需要新的 register，會做 register scavenging 看附近有沒有暫時沒被用到的 register。如果都沒有，就只能做 spill。
  * Stack slot
    * [[http://stackoverflow.com/questions/3232614/how-does-an-optimizing-c-compiler-reuse-stack-slots-of-a-function|how does an optimizing c++ compiler reuse stack slots of a function?]]
  * [[wp>Rematerialization]]
    * 將所需的值重新計算，避免對內存的存取。可以看做是 spill 的反向動作。

  * [[Sibling call]]
  * [[wp>Partial evaluation]]
    * Futamura projections[(http://people.cs.nctu.edu.tw/~chenwj/log/IM/chapuni64-2012-11-30.txt)]

  * [[wp>Fixed-point arithmetic]]
    * [[http://www-inst.eecs.berkeley.edu/~cs61c/sp06/handout/fixedpt.html|Introduction: Real numbers in Real world]]
    * [[https://www.dsprelated.com/showarticle/139.php|A Fixed-Point Introduction by Example]]
    * (二進制) 小數點前後的位數固定。一般用 Qm.f 表示，其中 m 表示整數位，f 表示小數位，另外隱藏一位有號位 (signed bit)。
    * 固定點運算可以先將小數點右移至底，將其當作一般整數運算，再將小數點左移。例如: 兩個 Q7.24 相乘 (Q7.24 加上有號位，總共 32 位)。<code>
- widen the i32s to i56s
- multiply
- shift right 24
- truncate to i32
</code>
    * [[wp>Saturation arithmetic]]
      * 飽和運算。運算結果有最大最小值，高於最大值以最大值計，低於最小值以最小值計，前述行為被形容為飽和。
    * 有理數/分數 (rational number/fractional number)
    * 實數 (real number)

  * [[https://stackoverflow.com/questions/32002157/how-to-understand-volatile-and-non-volatile-registers|How to understand volatile and non-volatile registers?]]
    * [[https://msdn.microsoft.com/en-us/library/6t169e9c.aspx|Caller/Callee Saved Registers]]
    * caller-saved register 是調用方在調用函式之前需要保存的暫存器，因為這些暫存器在調用函式之後，其內容會被揮發 (破壞) 掉，故又稱 volatile register。
    * callee-saved register 是被調用方在使用之前和之後需要保存和回復的暫存器。因為這些暫存器在調用函式之後，其內容會不變，故又稱 non-volatile register。
===== 硬件平台相關 =====
  * [[wp>VLIW]]
    * 非常依賴編譯器的軟流水優化。
  * Hardware Loop
    * [[http://www.eetimes.com/document.asp?doc_id=1275609|Architecture-oriented C optimization, part 1: DSP features]]
      * 一般 loop 在尾部透過 compare and branch 決定是否要繼續執行 loop。branch 指令會打斷 pipeline。
      * 增加硬件紀錄 loop counter 和 loop start/end。
    * [[http://llvm.org/devmtg/2011-11/Simpson_PortingLLVMToADSP.pdf|Porting LLVM to a Next Generation DSP]]
      * 一般 DSP 非常依賴 ILP，透過 SMT，VLIW 和 SIMD 達成。
  * [[wp>Delay Slot]]
    * Branch delay slot
      * [[http://www.pagetable.com/?p=313|Having Fun with Branch Delay Slots]]
      * 在 branch 指令之後的 pipeline stage 填入不論 branch taken 或是 not taken 都會執行的指令。屬於編譯器軟流水優化。
  * [[wp>Literal pool]][(http://www.cs.nctu.edu.tw/~chenwj/log/IM/zakk-literal-pool-2011-10-05.txt)]
    * ARM 相關。
    * [[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0473c/Bgbccbdi.html|Literal pools]]
    * [[http://benno.id.au/blog/2009/01/02/literal-pools|The trouble with literal pools]]
===== 代碼分析相關 =====
  * [[wp>Control flow graph]]
    * 以下的程序分析都發生在 control flow graph。
  * [[wp>Dominator (graph theory)|Dominator]]
    * 分析在 control flow graph 中，節點之間的關係。如果所有從 entry node 出發到 node n 的路徑，都需要經過 node d，則稱 d dominate n。每個 node 都 dominate 自己。{{https://upload.wikimedia.org/wikipedia/commons/thumb/2/22/Dominator_control_flow_graph.svg/400px-Dominator_control_flow_graph.svg.png?210x240}}
      * 1 dominate 3。所有從 1 出發到 3 的路徑，都需要經過 1。
      * 1/2 strictly dominate 3。1/2 dominate 3 且 1/2 不等於 3。
      * immediate dominator of 3 is 2。2 strictly dominate 3，且 2 沒有 strictly dominate 其它 strictly dominate 3 的 node; 相反的，1 strictly dominate 3，但是 1 strictly dominate 2，而 2 又 strictly dominate 3。
        * 單純從字面上理解 immediate dominator 應該也可以。
      * dominance frontier of n
        * frontier 是前沿。dominance frontier of n，是一個集合，集合中的節點，其 immediate predecessor 被 n 所 dominate，但節點本身不被 n 所 dominate。
        * 用來決定在 CFG 哪些地方設置 [[wp>Static single assignment form|SSA]] 所需的 Phi node。
          * 一個節點 n，假設引用到變量 v。如果 n 屬於兩個 dominance frontier 的交集，且這兩個 dominance frontier 各自都定義變量 v，則在節點 n 之前插入 Phi node。
      * dominator tree
        * tree 中每一個節點 n 的子節點 m，immediate dominator of m is n。
  * [[wp>Data-flow analysis]]
    * 分析程序在各個不同的點，屬性的變化。屬性的變化，可以從 CFG 由前往後，或是後往前推導。
      *  dead code elimination 所需的 live variable analysis 屬於 backward analysis。
        * [[wp>Live variable analysis]]
===== 代碼優化相關 =====
<note>附上 LLVM 相關代碼。</note>
  * [[https://en.wikipedia.org/wiki/Alias_analysis#Type-based_alias_analysis|Type-Based Alias Analysis (TBAA)]]
    * [[http://www.drdobbs.com/cpp/type-based-alias-analysis/184404273|Type-Based Alias Analysis]]
    * 假設不同型別的指針所指向的內存空間不重疊，並據此做優化。  
  * [[wp>Tail call]]
    * 可以用迭代實現的遞迴。請見 [[https://xuanji.appspot.com/isicp/1-2-procedures.html|Procedures and the Processes They Generate]]。
  * [[wp>Dead code elimination|Dead code elimination (DCE)]]
    * 消除未被使用的表達式。
    * [[http://llvm.org/docs/doxygen/html/DCE_8cpp_source.html|DCE.cpp]]
  * [[wp>Dead store|Dead store elimination (DSE)]]
    * 如果一個變量被賦值之後，其值未被使用，則為 dead store。
    * [[http://llvm.org/doxygen/DeadStoreElimination_8cpp_source.html|DeadStoreElimination.cpp]]
  * [[wp>Global value numbering|Global value numbering (GVN)]]
    * 為變量和表達式編號。兩個變量或表達式具有相同編號，代表其中之一可以被消除。和 CSE 是互補的優化。
    * [[http://blog.llvm.org/2009/12/introduction-to-load-elimination-in-gvn.html|Introduction to load elimination in the GVN pass]]
    * [[http://llvm.org/doxygen/GVN_8cpp_source.html|GVN.cpp]]
  * [[wp>Common subexpression elimination|Common subexpression elimination (CSE)]]
    * 將表達式中重複的部分提出，避免重複計算。  
  * [[wp>Loop-invariant code motion|Loop-invariant code motion (LICM)]]
    * 將迴圈運算中，不會改變的變數提升至迴圈外，避免重複計算。
    * [[http://llvm.org/docs/doxygen/html/LICM_8cpp_source.html|LICM.cpp]]
  * Scalar Replacement of Aggregates (SRoA)
    * [[https://blog.regehr.org/archives/1603|How LLVM Optimizes a Function]]
    * 如果 aggregate [(http://stackoverflow.com/questions/35722514/what-is-the-difference-between-scalar-types-and-aggregate-types-in-c)] (如: struct) 中的 field 沒有 alias，則將該 field 當成 scalar 處理，以利後續優化。底下是經過 LLVM ''opt'' 優化後的 IR。<code c>
// sroa.c
typedef struct {
  int x;
  int y;
} A;

int func() {
  A a = { .x = 1, .y = 2 };
  // int a.x.sroa = a.x;
  // int a.y.sroa = a.y;
  return a.x + a.y;
}</code><code llvm>
; clang -O0 -emit-llvm -S sroa.c -o sroa_O0.ll

define i32 @func() #0 {
entry:
  %a = alloca %struct.A, align 4
  %0 = bitcast %struct.A* %a to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* bitcast (%struct.A* @func.a to i8*), i64 8, i32 4, i1 false)
  %x = getelementptr inbounds %struct.A, %struct.A* %a, i32 0, i32 0
  %1 = load i32, i32* %x, align 4
  %y = getelementptr inbounds %struct.A, %struct.A* %a, i32 0, i32 1 
  %2 = load i32, i32* %y, align 4
  %add = add nsw i32 %1, %2
  ret i32 %add
}</code><code llvm>
; opt -sroa sroa_O0.ll -S -o sroa.ll

define i32 @func() #0 {
entry:
  %a.sroa.0.0.copyload = load i32, i32* getelementptr inbounds (%struct.A, %struct.A* @func.a, i64 0, i32 0), align 4
  %a.sroa.2.0.copyload = load i32, i32* getelementptr inbounds (%struct.A, %struct.A* @func.a, i64 0, i32 1), align 4
  %add = add nsw i32 %a.sroa.0.0.copyload, %a.sroa.2.0.copyload
  ret i32 %add
}
</code>
    * [[https://gcc.gnu.org/wiki/summit2010?action=AttachFile&do=get&target=jambor.pdf|The new intraprocedural Scalar Replacement of Aggregates]]
    * [[http://llvm.org/doxygen/SROA_8cpp_source.html|SROA.cpp]]
  * [[http://llvm.org/devmtg/2009-10/ScalarEvolutionAndLoopOptimization.pdf|ScalarEvolution and Loop Optimization]]
    * SCEV (Scalar Evolution)
      * [[https://gcc.gnu.org/onlinedocs/gccint/Scalar-evolutions.html|15.5 Scalar evolutions]]
      * 將隨著每個迭代有序變化的變量 (induction variable) 以 SCEV 表示。
    * [[http://www.cs.cmu.edu/afs/cs/academic/class/15745-f03/public/lectures/L7_handouts.pdf|Loop Invariant Computation and Code Motion]]
      * [[https://gcc.gnu.org/onlinedocs/gccint/Loop-representation.html|15.1 Loop representation]]
      * back edge (d -> a)，a dominate d。又叫 latch。
      * natural loop of a back edge。包含 back edge 頭尾兩個節點，構成迴圈的最小集合。
      * LLVM 會先做 [[http://llvm.org/doxygen/LoopSimplify_8cpp_source.html|LoopSimplify.cpp]] 將 natural loop 正規化，方便後續處理。
        * 插入 pre-header。確保外部進入 loop 只有一個入口點，即 pre-header。
        * 插入 exit-block。確保離開 loop 只有一個出口點，即 exit-block。
        * 確保只有一條 back edge。
      * 以 LICM 為例，會把 loop invariant 從 loop body 提升至 pre-header。
  * Vectorization
    * [[http://llvm.org/docs/Vectorizers.html|Auto-Vectorization in LLVM]]
  * [[http://en.cppreference.com/w/c/language/restrict|restrict type qualifier]]
    * [[https://llvm.org/devmtg/2017-02-04/Restrict-Qualified-Pointers-in-LLVM.pdf|Restrict-qualified pointers in LLVM]]
===== 代碼保護相關 =====
  * [[stack-smashing protector]]
    * 堆棧溢出保護
====== 知名編譯器 ======
  * [[http://nuwen.net/mingw.html|MinGW]]
  * [[http://www.edg.com/index.php|EDG]]
  * [[http://www.comeaucomputing.com/|Comeau C/C++]]
  * [[wp>Microsoft Visual Studio]] 
    * [[http://people.cs.nctu.edu.tw/~chenwj/slide/How%20we%20bring%20up%20%20a%20new%20target%20-%20Copy%20for%20%e9%99%b3%e9%9f%8b%e4%bb%bb.ppt|How We Bing Up a New Target]]
    * Od compilation 即是 -O0。
    * CRT: C runtime 函式庫。
  * [[https://www.iar.com/|IAR]]