<note>
  * 參考書籍:
    * [[https://www.amazon.com/Getting-Started-LLVM-Core-Libraries/dp/1782166920|Getting Started with LLVM Core Libraries]]
    * [[https://www.amazon.com/LLVM-Essentials-Suyog-Sarda/dp/1785280805|LLVM Essentials]]
  * 編譯器前端 Clang 的[[clang|維基]]。
  * 精簡提煉，儘量不描述細節。可以搭配命令行觀察結果更佳。
  * 開頭附註 LLVM 官方文檔。
  * 使用 gdb 時，可以透過 dump() 印出訊息。 
  * 善用 [[https://llvm.org/viewvc/llvm-project/|ViewVC]] 查閱相關功能的改動。
</note>

待整理:
  * [[http://lists.llvm.org/pipermail/llvm-dev/2017-November/118810.html|[llvm-dev] What pattern string corresponds to CopyToReg?]]

官方文檔:
  * [[http://www.llvm.org/docs/Lexicon.html|The LLVM Lexicon]]: LLVM 術語
  * [[http://www.llvm.org/docs/ProgrammersManual.html|LLVM Programmer’s Manual]]: 必讀。
  * [[http://www.llvm.org/docs/tutorial/|LLVM Tutorial]]
線上文章:
  * [[http://www.aosabook.org/en/llvm.html|Chapter 11. LLVM]]
  * [[http://www.informit.com/articles/article.aspx?p=1215438|How the LLVM Compiler Infrastructure Works]]
  * [[https://wwwcip.informatik.uni-erlangen.de/~sicherha/foo/tricore-llvm.pdf|Design and Implementation of a TriCore Backend for the LLVM Compiler Framework (PDF)]]
    * [[http://reup.dmcs.pl/wiki/images/7/7a/Tricore-llvm-slides.pdf|Design and Implementation of a TriCore Backend
for the LLVM Compiler Framework (Slide)]]
  * [[http://scholarworks.rit.edu/theses/9550/|The Design of a Custom 32-bit RISC CPU and LLVM Compiler Backend]]
  * [[http://elinux.org/images/b/b7/LLVM-ELC2009.pdf|Understanding and writing an LLVM compiler back-end]]
  * [[https://www.slideshare.net/WeiRenChen/mclinker2013chenwj|Part II: LLVM Intermediate Representation]]
  * [[https://www.slideshare.net/zongfanyang/coscup-2016-llvm|COSCUP 2016 - LLVM 由淺入淺]]

線上課程:
  * [[http://www.cl.cam.ac.uk/teaching/1617/L25/|Modern Compiler Design]]

開發注意事項:
  * O2 優化的 bug。基本透過裁減代碼，得到最小的測試用例以定位到問題點。目前的經驗是裁減到函式級別，隨即對該函式的匯編進行除錯。
    * 測試套儘早運行，及早抓到 bug。
    * 開發時如果沒有 debugger，需要透過 printf 打印訊息。但是要注意 printf 是否能正確執行，另外 printf 可能會影響 O2 代碼生成，請謹慎使用。<code c>
printf("%8x %8x %8x %8x", *((unsigned int *)(&var) + 0),
                          *((unsigned int *)(&var) + 1),
                          *((unsigned int *)(&var) + 2),
                          *((unsigned int *)(&var) + 3));
</code>
    * 針對目標函式，打印 IR 的一連串變換 (LLVM IR -> DAG -> Machine IR -> MCInstr)，觀察何時產生問題代碼。<code bash>
$ clang -O2 -c -mllvm -print-after-all -mllvm -filter-print-funcs=main func.c
</code>
    * 如果是從 GCC 移植到 LLVM，可能有有限的 debugger。必要時，對比 O0 和 O2 的匯編，觀察暫存器和內存的值，再反推源代碼出錯的位址，進而推導優化過程中的問題。匯編和源代碼的映射必須基本理解，以利反推。
  * 關於 O2 代碼生成質量，盡可能從 LLVM IR，SelectionDAG 和 TableGen 方向上優化，再考慮其它作法 (如 peephole)。
    * 參考其他後端時，從 LLVM IR 開始對比。即使只有細微差異，都可能會影響後續優化效果。
      * LLVM IR<code bash>
$ clang -O0 -emit-llvm -S -o sum.ll sum.c
# 和參考後端比較 O2 IR 生成過程中是否有差異。
$ opt -O2 sum.ll -print-after-all
</code>
      * SelectionDAG<code bash>
$ clang -O2 -emit-llvm -S -o sum.ll sum.c
# 觀察 DAG 變換過程，看是否有優化空間。
$ llc sum.ll -debug
</code>
    * 優化/減少 load/store。
      * [[http://lists.llvm.org/pipermail/llvm-dev/2015-April/084155.html|[LLVMdev] How to enable use of 64bit load/store for 32bit	architecture]]
====== 編譯 LLVM ======
  * [[http://llvm.org/docs/GettingStarted.html|Getting Started with the LLVM System]]
    * [[https://ninja-build.org/|Ninja, a small build system with a focus on speed]]<code bash>
$ brew install cmake ninja
$ git clone http://llvm.org/git/llvm.git
$ git clone http://llvm.org/git/clang.git llvm/tools/clang
$ mkdir build.Ninja; cd build.Ninja
$ cmake -G Ninja -DCMAKE_INSTALL_PREFIX=/Users/chenwj/Projects/opt/ -DCMAKE_BUILD_TYPE=Debug ../llvm
$ ninja; ninja install
</code>
    * [[http://stackoverflow.com/questions/17980759/xcode-select-active-developer-directory-error|xcode-select active developer directory error]]<code bash>
$ mkdir build.Xcode
$ cmake -G Xcode -DCMAKE_INSTALL_PREFIX=/Users/chenwj/Projects/opt/ -DCMAKE_BUILD_TYPE=Debug ../llvm
$ open LLVM.xcodeproj
</code>
      * 或是點擊 LLVM.xcodeproj/project.pbxproj，開啟 Xcode。
      * 選擇手動建立 Scheme，目前只選 clang。
    * [[https://heejune.me/2016/08/09/wheres-my-llvmtoolsclangexample-binaries/|Where’s my ‘llvm/tools/clang/example’ binaries?]]<code bash>
$ cd build.Ninja
$ cmake -G Ninja -DCMAKE_INSTALL_PREFIX=/Users/chenwj/Projects/opt/ -DCMAKE_BUILD_TYPE=Debug -DLLVM_BUILD_EXAMPLES=1 ../llvm
# 編譯 example BrainF
$ ninja BrainF
</code>
====== 使用範例 ======
<note>
  * -help-hidden 可以顯示更多的選項。
  * -debug-only=<debug string> 可以只顯示屬於特定 DEBUG_TYPE 的資訊。 
    * grep DEBUG_TYPE lib/* | grep pre-RA-sched 可以列出屬於同一類 DEBUG_TYPE 的檔案。
  * -filter-view-dags=<string> 可以只顯示特定 basic block 的 DAG 資訊。
</note>
  * [[http://llvm.org/docs/CommandGuide/index.html|LLVM Command Guide]]

  * 輸出 LLVM IR。-S 輸文字格式 LLVM IR (.ll)，-c 輸出二進制格式 LLVM IR (.bc)。<code bash>
$ clang -emit-llvm -S hello.c -o hello.ll
$ clang -emit-llvm -c hello.c -o hello.bc
</code>
  * 透過 Clang，傳命令行參數給 LLVM 後端。<code bash>
$ clang -c sum.c -mllvm -debug-pass=Structure
</code>
  * 匯編和反匯編 LLVM IR。匯編將 .ll 轉成 .bc; 反匯編將 .bc 轉成 .ll。<code bash>
$ llvm-as hello.ll -o hello.bc
$ llvm-dis hello.bc -o hello.ll
</code>
  * 如果想要使用 LLVM 的標頭檔和函式庫。<code bash>
$ gcc `llvm-config --cxxflags --ldflags` foo.c `llvm-config --libs`
</code>
  * 將 LLVM IR 編譯成目標平台匯編或目的檔。<code bash>
# 顯示隱藏選項。
$ llc -help-hidden
# 顯示在編譯過程中傳遞哪些參數給 opt。
$ llc -debug-pass=Arguments sum.ll
# 顯示在編譯過程中 PassManager 調度哪些 pass。
$ llc -debug-pass=Structure sum.ll
# 只顯示代碼中有 #define DEBUG_TYPE "legalize-types" 的 pass。
$ llc -debug-only=legalize-types sum.ll
</code>
  * 優化 LLVM IR。<code bash>
# 顯示隱藏選項。
$ opt -help-hidden
# 顯示每一個 pass 處理之後的 LLVM IR。可以用來觀察感興趣的 pass 做了哪些變換。
$ opt -S -O2 -print-after-all sum.ll
</code>
====== LLVM IR ======
  * [[http://llvm.org/docs/LangRef.html|LLVM Language Reference Manual]]
    * LLVM IR 並非平台中立。請見
      * [[http://www.llvm.org/docs/FAQ.html#can-i-compile-c-or-c-code-to-platform-independent-llvm-bitcode|Can I compile C or C++ code to platform-independent LLVM bitcode?]]
        * LLVM IR 作為編譯器的中間語言，不需要像 Java bytecode 那樣考慮平台中立性，設計上可以往方便後端優化考慮。
          * [[http://llvm.org/devmtg/2011-09-16/EuroLLVM2011-MoreTargetIndependentLLVMBitcode.pdf|More Target Independent LLVM Bitcode]]
          * [[http://lists.llvm.org/pipermail/llvm-dev/2011-October/043777.html|[LLVMdev] LLVM IR is a compiler IR]]
    * 對應 LLVM [[http://llvm.org/doxygen/Instruction_8h_source.html|Instruction]]

<code bash>
$ cat sum.c
int sum(int a, int b) {
  return a + b;
}
$ clang -emit-llvm -S sum.c -o sum.ll
</code>
<code llvm>
; ModuleID = 'sum.c'
source_filename = "sum.c"
; 從底下 target datalayout 和 triple 即可知 LLVM IR 並非平台中立。
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.11.0"

; Function Attrs: noinline nounwind ssp uwtable
define i32 @sum(i32 %a, i32 %b) #0 {
entry:
  %a.addr = alloca i32, align 4        ; 在 stack 上分配空間給 i32，4 byte 對齊。
  %b.addr = alloca i32, align 4
  store i32 %a, i32* %a.addr, align 4  ; 將函式參數存到 stack。
  store i32 %b, i32* %b.addr, align 4
  %0 = load i32, i32* %a.addr, align 4 ; 從 stack 讀出 operand。
  %1 = load i32, i32* %b.addr, align 4
  %add = add nsw i32 %0, %1            ; 加法運算。
  ret i32 %add                         ; 返回運算結果。
}
</code>
  * [[http://llvm.org/docs/LangRef.html#data-layout|Data Layout]]: 有些優化會查詢 data layout，請注意設置是否正確。
  * [[http://llvm.org/docs/LangRef.html#target-triple|Target Triple]]: ARCHITECTURE-VENDOR-OPERATING_SYSTEM

跟 LLVM IR 相關的類如下，所在目錄為 [[http://llvm.org/doxygen/dir_c3e93f23a4a31c717998b98ce143b7c0.html|include/llvm/IR]]:
  * [[http://www.llvm.org/docs/ProgrammersManual.html#the-module-class|Module]]: Module 對應到一個 C 檔案。Module 包含多個 Function。
  * [[http://www.llvm.org/docs/ProgrammersManual.html#the-function-class|Function]]: Function 對應到一個 C 函式。Function 包含多個 BasicBlock。
  * [[http://www.llvm.org/docs/ProgrammersManual.html#the-basicblock-class|BasicBlock]]: BasicBlock 包含一段 single entry single exit 的 Instruction。
  * [[http://www.llvm.org/docs/ProgrammersManual.html#the-instruction-class|Instruction]]: 是所有 LLVM IR 的基類。
  * [[http://www.llvm.org/docs/ProgrammersManual.html#the-value-class|Value]]: 產生 (define) 值 (Value) 的類均繼承 Value。Value 代表可以被 Instruction 使用，有型別的值 (typed value)。注意！BasicBlock 也是 Value (即為 Value 的子類)，因為 BasicBlock 可以作為分支指令的目標。
    * 一個 Value 被定義 (define) 之後，可以被多次使用 (use)。Value 維護一個 User 列表，紀錄誰使用到該 Value。提供接口取得 def-use chain 中的 use。
  * [[http://www.llvm.org/docs/ProgrammersManual.html#the-user-class|User]]: 使用 (use) 值 (Value) 的類均繼承 User。User 維護一個 operand 列表，代表 User 用到了哪些 Value。
    * 每個 operand 會指向它所參考的 Value。因為 LLVM IR 是 SSA，這種參考關係是唯一。提供接口取得 use-def chain 中的 def。

較難理解的部分:
  * [[http://llvm.org/docs/LangRef.html#getelementptr-instruction|getelementptr]]
    * [[http://llvm.org/docs/GetElementPtr.html|The Often Misunderstood GEP Instruction]]
    * 為 [[http://llvm.org/docs/LangRef.html#t-aggregate|aggregate type]] (LLVM 的 array 和 struct) 的子元素計算位址，<color #ed1c24>不涉及內存存取。</color><code bash>
$ cat struct.c
struct RT {
  char A;
  char C;
};

char *func1(struct RT *s) {
  return &(s->A);
}

char *func2(struct RT *s) {
  return &(s->C);
}

char *func3(struct RT *s) {
  return &(s[1].C);
}
$ clang -emit-llvm -O1 -S struct.c -o struct.ll
</code><code llvm>
%struct.RT = type { i8, i8 }

; Function Attrs: norecurse nounwind readnone ssp uwtable
define i8* @func1(%struct.RT* readnone %s) local_unnamed_addr #0 {
entry:
  %A = getelementptr inbounds %struct.RT, %struct.RT* %s, i64 0, i32 0
  ret i8* %A
}

; Function Attrs: norecurse nounwind readnone ssp uwtable
define i8* @func2(%struct.RT* readnone %s) local_unnamed_addr #0 {
entry:
  %C = getelementptr inbounds %struct.RT, %struct.RT* %s, i64 0, i32 1
  ret i8* %C
}

; Function Attrs: norecurse nounwind readnone ssp uwtable
define i8* @func3(%struct.RT* readnone %s) local_unnamed_addr #0 {
entry:
  %C = getelementptr inbounds %struct.RT, %struct.RT* %s, i64 1, i32 1
  ret i8* %C
}
</code>
      * 第一個參數是 ''%struct.RT'': 第二個參數所指類型。
      * 第二個參數是 ''%struct.RT* %0'': 計算位址的起始位址，其型別一定是指針類型。之後的參數皆是欲對其計算位址的子元素的 index。
      * 第三個參數是 ''i32 0'': 類似 ''%0[0]''。LLVM 某種程度上將第二個參數視為 array。如果實際上不是 array，此值皆為 0。
      * 第四個參數是 ''i32 0'': 類似 ''%0[0].A''。
  * [[http://llvm.org/docs/LangRef.html#undefined-values|Undefined Values]]: 指令中帶有 undefined value，代表指令對該值為何不感興趣 (undefined value 可以代表任意的值[(http://lists.llvm.org/pipermail/llvm-dev/2017-June/114489.html)])。編譯器對此類情況有機會可以優化。
  * [[http://llvm.org/docs/LangRef.html#poison-values|Poison Values]]: 指令中帶有 poison value，其結果為 undefined behavior。
===== InstCombine =====
<note>
  * [[http://llvm.org/doxygen/InstructionSimplify_8cpp_source.html|InstructionSimplify.cpp]] 也負責簡化 LLVM IR。和 InstCombine 不同的地方在於只做運算元折疊，不新建指令。
</note>
  * [[http://llvm.org/doxygen/classllvm_1_1InstCombiner.html|InstCombine]] 類似於 SelectionDAG 中的 [[http://llvm.org/doxygen/DAGCombiner_8cpp.html|DAGCombiner]]，只不過一個作用於 LLVM IR，另一個作用於 SelectionDAG，目的都是簡化 IR/SelectionDAG。
    * [[http://llvm.org/doxygen/classllvm_1_1InstCombinePass.html#aa1ffc7ad08181545bbc25eabac7f5aae|InstCombinePass::run]]
    * [[http://llvm.org/doxygen/classllvm_1_1InstCombiner.html#aded7582678e6359830469d17c727b04e|InstCombiner::run()]]<code c++>
bool InstCombiner::run() {
    if (Instruction *Result = visit(*I)) {
      ++NumCombined;
      // Should we replace the old instruction with a new one?
      if (Result != I) {
        DEBUG(dbgs() << "IC: Old = " << *I << '\n'
                     << "    New = " << *Result << '\n');
    }
}
</code>
    * [[http://llvm.org/doxygen/classllvm_1_1InstCombiner.html#a7d92fd2831240f33fba5b54ec91df58d|InstCombiner::visitTrunc()]]
===== Example =====
  * Bitfield ([[https://www.tutorialspoint.com/cprogramming/c_bit_fields.htm|C - Bit Fields]])<code bash>
$ cat bitfield.c
typedef struct {
   unsigned int width  : 16;
   unsigned int height : 16;
} status;

void foo() {
  status s;
  s.width = 4;
  s.height = 5;
}
$ clang bitfiled.c -emit-llvm -S -o bitfiled.ll
</code><code llvm>
%struct.status = type { i32 }

; Function Attrs: noinline nounwind ssp uwtable
define void @foo() #0 {
entry:
  %s = alloca %struct.status, align 4
  %0 = bitcast %struct.status* %s to i32*
  %bf.load = load i32, i32* %0, align 4
  %bf.clear = and i32 %bf.load, -65536    ; and FFFF0000。0000 代表先被 clear，之後要被設置的部分，取 status.width。
  %bf.set = or i32 %bf.clear, 4           ; or 0004。把 status.width 設為 4。
  store i32 %bf.set, i32* %0, align 4
  %1 = bitcast %struct.status* %s to i32*
  %bf.load1 = load i32, i32* %1, align 4
  %bf.clear2 = and i32 %bf.load1, 65535   ; and 0000FFFF。0000 代表先被 clear，之後要被設置的部分，取 status.height。
  %bf.set3 = or i32 %bf.clear2, 327680    ; or 50000。把 status.height 設為 5。
  store i32 %bf.set3, i32* %1, align 4
  ret void
}
</code>
    * 先 and 把要設置的部分 clear，再對其做 or 把值設進去。
    * [[https://clang.llvm.org/doxygen/classclang_1_1CodeGen_1_1CodeGenFunction.html#a35cce91120c88f4809f5bb323acf6851|EmitStoreThroughBitfieldLValue]]
====== MachineInstr ======
<note>
  * [[https://llvm.org/docs/CodeGenerator.html#machine-code-description-classes|Machine code description classes]]
  * [[https://llvm.org/docs/MIRLangRef.html|Machine IR (MIR) Format Reference Manual]]
    * [[https://www.youtube.com/watch?v=objxlZg01D0|Welcome to the back-end: The LLVM machine representation (2017)]]
</note>
術語:
  * MI ([[http://llvm.org/doxygen/classllvm_1_1MachineInstr.html|MachineInstr]])
    * 和 LLVM IR 一樣，MI 屬於 [[http://llvm.org/doxygen/classllvm_1_1MachineBasicBlock.html|MachineBasicBlock]]; MachineBasicBlock 屬於 [[http://llvm.org/doxygen/classllvm_1_1MachineFunction.html|MachineFunction]]。
    * [[http://llvm.org/doxygen/classllvm_1_1MachineOperand.html|MachineOperand]]: 此條 MI 使用的操作數。
    * [[http://llvm.org/doxygen/classllvm_1_1MCInstrDesc.html|MCInstrDesc]]: 描述此條 MI 相關屬性。
  * MRI ([[http://llvm.org/doxygen/classllvm_1_1MachineRegisterInfo.html|MachineRegisterInfo]])
    * 用來創建虛擬或物理暫存器，並追蹤其相關訊息。
  * TII ([[http://llvm.org/docs/doxygen/html/classllvm_1_1TargetInstrInfo.html|TargetInstrInfo]])
    * MI 所保存的 Opcode 需要透過目標平台的 TargetInstrInfo 解釋才能得到對應的指令。[[https://llvm.org/docs/CodeGenerator.html#create-instructions|BuildMI]] 創建 MI 時，需要 TII。

  * ''llc -print-machineinstrs'' 列出指令選擇後，各個階段的 MachineInstr。<code bash>
$ llc -print-machineinstrs sum.ll
# After Instruction Selection:
# 打印出 MachineFunction 的 Property。從 IsSSA 和 TracksLiveness 可以知道我們現在處於暫存器分配前的階段。
# Machine code for function sum: IsSSA, TracksLiveness
Frame Objects:
  fi#0: size=4, align=4, at location [SP+8]
  fi#1: size=4, align=4, at location [SP+8]
Function Live Ins: %EDI in %vreg0, %ESI in %vreg1

# MachineBasicBlock 可以連結到相關的 LLVM IR BasicBlock。
BB#0: derived from LLVM BB %entry
    Live Ins: %EDI %ESI
	%vreg1<def> = COPY %ESI; GR32:%vreg1
	%vreg0<def> = COPY %EDI; GR32:%vreg0
	MOV32mr <fi#0>, 1, %noreg, 0, %noreg, %vreg0; mem:ST4[%a.addr] GR32:%vreg0
	MOV32mr <fi#1>, 1, %noreg, 0, %noreg, %vreg1; mem:ST4[%b.addr] GR32:%vreg1
	%vreg2<def,tied1> = ADD32rm %vreg1<tied0>, <fi#0>, 1, %noreg, 0, %noreg, %EFLAGS<imp-def,dead>; mem:LD4[%a.addr](dereferenceable) GR32:%vreg2,%vreg1
	%EAX<def> = COPY %vreg2; GR32:%vreg2
	RET 0, %EAX
</code>
    * ''%vreg'' 是虛擬暫存器，''%ESI'' 是物理暫存器。
      * ''<def>'' 表示該暫存器於該指令被定義，沒有 ''<def>'' 代表該暫存器於該指令被使用。
      * ''<dead>'' 代表被定義的暫存器在該指令之後沒有被使用。 ''<kill>'' 代表該指令是最後一條使用該暫存器的指令。
      * ''imp-def'' 和 ''imp-use'' 表示該暫存器是被隱式定義或使用，不出現在匯編指令中，如 ''%EFLAGS''。
      * 對於同時使用和定義的暫存器，以 ''tied'' 表示。其餘 flag 的意義請見 [[http://llvm.org/doxygen/classllvm_1_1MachineOperand.html|MachineOperand]]。
    * ''<fi#0>'' 表示存取虛擬棧上第幾個 index 的位置。在經過 [[https://llvm.org/doxygen/PrologEpilogInserter_8cpp_source.html|PrologEpilogInserter.cpp]] 之後，''<fi#0>'' 會轉換成目標平台的棧指針計算。
      * 棧指針計算過程中，可能因為立即數超過範圍，必須將中間結果存在臨時的暫存器。但注意，[[https://llvm.org/doxygen/PrologEpilogInserter_8cpp_source.html|PrologEpilogInserter.cpp]] 發生在暫存器分配之後，此時需要 [[http://llvm.org/doxygen/RegisterScavenging_8cpp_source.html|RegisterScavenging.cpp]] 清出可用的暫存器。

  * 對於 VLIW 架構，[[https://llvm.org/docs/CodeGenerator.html#machineinstr-bundles|MachineInstr Bundles]] 需要了解。

  * 針對 MI pass 的測試。<code bash>
$ llc -stop-after=isel sum.ll -o sum.mir
$ llc -run-pass=machine-scheduler sum.mir -o sum_scheduled.mir
$ llc -start-after=machine-scheduler sum_scheduled.mir -o sum.s
</code><code bash>
$ llc -verify-machineinstrs sum.ll
</code>
====== MachineCode ======
從匯編器角度所設計的 [[http://llvm.org/doxygen/MCInst_8h_source.html|MCInst]] (MachineCode)，所帶的資訊比 [[http://llvm.org/doxygen/classllvm_1_1MachineInstr.html|MachineInstr]] (MI) 更少，其主要用途是讓匯編器生成目的檔，或是透過反匯編器生成 MCInst。

{{http://image1.slideserve.com/2196423/slide4-n.jpg?500}}

相關文章:
  * [[http://blog.llvm.org/2010/04/intro-to-llvm-mc-project.html|Intro to the LLVM MC Project]]
  * [[http://llvm.org/devmtg/2011-11/Grosbach_Anderson_LLVMMC.pdf|LLVM MC in Practice]]
  * [[http://www.embecosm.com/appnotes/ean10/ean10-howto-llvmas-1.0.html|Howto: Implementing LLVM Integrated Assembler]]

<code bash>
# 產生匯編的同時，於註解加上對應的 MCInst。
$ llc -asm-show-inst sum.ll -o -
	movq	%rsp, %rbp              ## <MCInst #1741 MOV64rr
                                        ##  <MCOperand Reg:36>
                                        ##  <MCOperand Reg:44>>
# 產生匯編的同時，於註解加上對應的指令編碼。 
$ llc -show-mc-encoding sum.ll
	movq	%rsp, %rbp              ## encoding: [0x48,0x89,0xe5]
# llvm-mc 用於測試 MCInst 的匯編和反匯編。
$ echo "movq%rsp, %rbp" | llvm-mc -show-encoding
	.section	__TEXT,__text,regular,pure_instructions
	movq	%rsp, %rbp              ## encoding: [0x48,0x89,0xe5]
$ echo "0x48,0x89,0xe5" | llvm-mc -disassemble
	.section	__TEXT,__text,regular,pure_instructions
	movq	%rsp, %rbp
$ echo "0x48,0x89,0xe5" | llvm-mc -disassemble -show-inst
	.section	__TEXT,__text,regular,pure_instructions
	movq	%rsp, %rbp              ## <MCInst #1741 MOV64rr
                                        ##  <MCOperand Reg:36>
                                        ##  <MCOperand Reg:44>>
</code>

在 post-register allocation pass 之後，於 [[http://llvm.org/doxygen/classllvm_1_1AsmPrinter.html|AsmPrinter]] 調用 EmitInstruction，將 MI 轉換成 MCInst，可以輸出匯編或是目的檔。目標平台繼承 AsmPrinter 並覆寫相關函式。

{{https://image.slidesharecdn.com/mclinker-2013-chenwj-160719142821/95/part-ii-llvm-intermediate-representation-29-638.jpg?500}}
  * [[http://llvm.org/doxygen/classllvm_1_1X86AsmPrinter.html#a6df9e42fd7764d9b3dd03c187b0ebd1e|X86AsmPrinter::runOnMachineFunction()]]
    * [[http://llvm.org/doxygen/classllvm_1_1X86AsmPrinter.html#a5898a2d4fd90468e41f05af8e5e7fa6a|X86AsmPrinter::EmitInstruction	()]]
      * 在這過程中，透過 [[http://llvm.org/doxygen/X86MCInstLower_8cpp_source.html|X86MCInstLower]] 將 MI 轉成 MCInst。

<code bash>
$ lldb llc
(lldb) b X86AsmPrinter::runOnMachineFunction
# MCAsmStreamer
(lldb) r -filetype=asm sum.ll
# MCObjectStreamer
(lldb) r -filetype=obj sum.ll
</code>
======= Pass =======
  * [[http://llvm.org/docs/Passes.html|LLVM's Analysis and Transform Passes]]
    * Pass 主要可以分為兩類，[[http://llvm.org/docs/Passes.html#analysis-passes|Analysis Passes]] 和 [[http://llvm.org/docs/Passes.html#transform-passes|Transform Passes]]。Transform 依賴 Analysis 的結果。[[http://llvm.org/doxygen/PassManager_8h_source.html|Pass Manager]] 根據 Pass 之間的依賴關係執行調度。比如要先執行 use analysis 再執行 dead code elimination。
    * 可以用 ''opt'' 在 LLVM IR 上運行各種 Pass，如: [[http://llvm.org/docs/Passes.html#mem2reg-promote-memory-to-register|mem2reg]] 。<code bash>
# 對 sum.ll 運行 mem2reg Pass，再將其產生的 LLVM bitcode 反組譯成 LLVM IR。
$ opt -mem2reg sum.ll | llvm-dis
</code><code llvm>
; Function Attrs: noinline nounwind ssp uwtable
define i32 @sum(i32 %a, i32 %b) #0 {
entry:
  %add = add nsw i32 %a, %b ; 可以觀察到 alloca 被消除。
  ret i32 %add
}
</code>

  * 加入自己的 Pass 請見 [[http://llvm.org/docs/WritingAnLLVMPass.html|Writing an LLVM Pass]]<code bash>
$ opt -load=install/lib/libLLVMHello.so -hello < hello.bc > /dev/null
</code>
    * 如果出現底下錯誤訊息，請確定 ''opt'' 和 ''libLLVMHello.so'' 是同一份源碼和設定編譯而成。<code bash>
Error opening 'install/lib/libLLVMHello.so': install/lib/libLLVMHello.so: undefined symbol: _ZN4llvm4Pass26getAdjustedAnalysisPointerEPKv
  -load request ignored.
</code>

  * 後端比較常見的是 [[http://llvm.org/doxygen/classllvm_1_1MachineFunctionPass.html|MachineFunctionPass]]
    * [[http://llvm.org/docs/WritingAnLLVMPass.html#the-machinefunctionpass-class|The MachineFunctionPass class]]
  * Pass 透過實現 ''getAnalysisUsage(AnalysisUsage &AU)'' 指定其輸出入的依賴關係。''AU.addRequired<PassName>()'' 指定當前 pass 需要哪些 PassName 先運行; ''AU.addPreserved<PassName>()'' 指定當前 pass 運行之後，PassName 的結果不受影響，可以繼續使用。
    * [[http://llvm.org/docs/WritingAnLLVMPass.html#specifying-interactions-between-passes|Specifying interactions between passes]]

  * [[http://llvm.org/doxygen/classllvm_1_1PassManager.html|PassManager]]
    * [[http://llvm.org/devmtg/2007-05/03-Patel-Passmanager.pdf|Demystifying The LLVM Pass Manager]] 描述[[http://llvm.org/doxygen/classllvm_1_1legacy_1_1PassManagerBase.html|舊版 Pass Manager]]。
    * [[http://llvm.org/devmtg/2014-04/PDFs/Talks/Passes.pdf|Passes in LLVM (2014/04)]] ([[https://www.youtube.com/watch?v=rY02LT08-J8|Video]]) 和 [[http://llvm.org/devmtg/2014-10/Slides/Carruth-TheLLVMPassManagerPart2.pdf|The LLVM PassManager Part 2 (2014/10)]] ([[http://llvm.org/devmtg/2014-10/Videos/The%20LLVM%20Pass%20Manager%20Part%202-720.mov|Video]]) 描述 [[http://llvm.org/doxygen/classllvm_1_1PassManagerBuilder.html|PassManagerBuilder]]。
      * PassManagerBuilder 根據優化等級建立 pass pipeline。以 [[https://github.com/llvm-mirror/llvm/blob/master/tools/opt/opt.cpp|opt]] 為例: <code cpp>
static void AddOptimizationPasses(legacy::PassManagerBase &MPM,
                                  legacy::FunctionPassManager &FPM,
                                  TargetMachine *TM, unsigned OptLevel,
                                  unsigned SizeLevel) {
  if (!NoVerify || VerifyEach)
    FPM.add(createVerifierPass()); // Verify that input is correct

  PassManagerBuilder Builder;
  Builder.OptLevel = OptLevel;
  Builder.SizeLevel = SizeLevel;

  if (DisableInline) {
    // No inlining pass
  } else if (OptLevel > 1) {
    Builder.Inliner = createFunctionInliningPass(OptLevel, SizeLevel, false);
  } else {
    Builder.Inliner = createAlwaysInlinerLegacyPass();
  }

  Builder.populateFunctionPassManager(FPM);
  Builder.populateModulePassManager(MPM);
}
</code>
        * PassManagerBuilder 透過 [[http://llvm.org/doxygen/classllvm_1_1PassManagerBuilder.html#a4481c1ec64d03d8e75179e30205a8b0e|populateFunctionPassManager]] 和 [[http://llvm.org/doxygen/classllvm_1_1PassManagerBuilder.html#a5e9bd6b778471eabca960496422421b3|populateModulePassManager]] 對舊有的 [[http://llvm.org/doxygen/classllvm_1_1legacy_1_1FunctionPassManager.html|FunctionPassManager]] 和 ModulePassManager (其實是 PassManagerBase?) 添加 pass，組成 pass pipeline。
    * 舊版 PassManager 透過繼承，決定是 Function 還是 Module PassManager。{{http://llvm.org/doxygen/classllvm_1_1legacy_1_1PassManagerBase__inherit__graph.png}}
    * 新版 PassManager 透過模板參數，決定是 Function 還是 Module PassManager。<code cpp>
extern template class PassManager<Module>;
typedef PassManager<Module> ModulePassManager;

extern template class PassManager<Function>;
typedef PassManager<Function> FunctionPassManager
</code>

  * [[http://llvm.org/doxygen/classllvm_1_1Pass.html|Pass]]
    * 以 [[http://llvm.org/doxygen/PreISelIntrinsicLowering_8cpp_source.html|PreISelIntrinsicLowering.cpp]] 為例:
      * PreISelIntrinsicLoweringLegacyPass 是舊版 pass，繼承 [[http://llvm.org/doxygen/classllvm_1_1ModulePass.html|ModulePass]]。注意，這些 LegacyPass 放在 anonymous namespace，只在檔案內部可見。
      * [[http://llvm.org/doxygen/structllvm_1_1PreISelIntrinsicLoweringPass.html|PreISelIntrinsicLoweringPass]] 是新版 pass，可以看到不再繼承 ModulePass。

  * [[http://llvm.org/doxygen/classllvm_1_1TargetPassConfig.html|TargetPassConfig]]
    * 目標平台在 TargetMachine.cpp 實現 TargetPassConfig，可以將目標平台所需的 pass 添加進 pipeline。
====== Code Generator ======
<note>
  * [[http://llvm.org/docs/CodeGenerator.html|The LLVM Target-Independent Code Generator]]
    * [[http://eli.thegreenplace.net/2012/11/24/life-of-an-instruction-in-llvm|Life of an instruction in LLVM]]
    * [[http://eli.thegreenplace.net/2013/02/25/a-deeper-look-into-the-llvm-code-generator-part-1|A deeper look into the LLVM code generator, Part 1]]
  * [[http://llvm.org/docs/WritingAnLLVMBackend.html|Writing an LLVM Backend]]
  * 務必參閱 [[https://www.amazon.com/Getting-Started-LLVM-Core-Libraries/dp/1782166920|Getting Started with LLVM Core Libraries]] 第六章。
    * LLVM CodeGen 會隨著時間演進。這裡只提供基本且概略的描述。
</note>

LLVM 後端基本上有三類 IR ([[http://llvm.org/devmtg/2011-09-16/EuroLLVM2011-LLVMplusARM.pdf|LLVM + ARM = ?]])，由上至下依序是:
  * SelectionDAG (DAG)
    * [[http://llvm.org/docs/CodeGenerator.html#target-independent-code-generation-algorithms|Target-independent code generation algorithms]]
  * MachineInstr (MI)
    * [[http://llvm.org/docs/CodeGenerator.html#machine-code-description-classes|Machine code description classes]]
  * MachineCode (MC)
    * [[http://llvm.org/docs/CodeGenerator.html#the-mc-layer|The “MC” Layer]]
    * 提供匯編器所需的抽象層，用來生成目的檔 (object file) 。

後兩者的名稱容易被混淆:

  - MachineInstr: SelectionDAG 最終會轉成 MachineInstr。MachineInstr 是目標機器碼的抽象，可以表示成 SSA 和 non-SSA 形式。在 register allocation 之前/後，分別為 SSA 和 non-SSA 形式。
  - MachineCode: [[http://llvm.org/docs/CodeGenerator.html#code-emission|Code Emission]] 負責將 MachineInst 轉成 MachineCode。MachineCode 處理最後生成的機器碼 。

如同 [[http://reup.dmcs.pl/wiki/images/7/7a/Tricore-llvm-slides.pdf|Design and Implementation of a TriCore Backend
for the LLVM Compiler Framework]] 第 12 頁所展示的，LLVM 後端的流程如下:

  LLVM IR -> DAG Lowering -> non-legalized DAGs -> DAG Legalization -> legalized DAGs
    -> Instruction Selection -> DAGs (native instructions, MI) -> Scheduling
    -> SSA Form -> SSA-based Opt -> SSA Form -> Register Allocation
    -> native instrictions with phys reg -> Post Allocation
    -> Prolog/Epilog Code Insertion -> resolved stack reference
    -> Peehole Opt -> Assembly Printing -> assembly

[[http://llvm.org/devmtg/2012-04-12/Slides/Workshops/Anton_Korobeynikov.pdf|Tutorial: Building a backend in 24 hours (2012)]] 第 3 頁展示各個階段分別為何種形式的中介碼。
  - LLVM IR -> DAG (平台無關): 這一步驟是為了將來方便做 instruction selection 和 instruction scheduling。LLVM 透過 tblgen 產生部分 instruction selector，後端需要客製化的部分另外寫成 C++ 代碼。請見 [[http://llvm.org/docs/CodeGenerator.html#introduction-to-selectiondags|Introduction to SelectionDAGs]]。在 instruction selection 之前包含底下幾個步驟:
    * 透過 [[http://llvm.org/doxygen/SelectionDAGBuilder_8h_source.html|SelectionDAGBuilder]] 將 LLVM IR 轉成 SelectionDAG，簡稱 DAG。所得到的 DAG 可能會出現目標平台不支援的型別或是運算，故稱此階段的 DAG 為 non-legalized DAG。
      * [[http://llvm.org/docs/CodeGenerator.html#initial-selectiondag-construction|Initial SelectionDAG Construction]]
    * 將 non-legalized DAG 轉成 legalized DAG。先將目標平台不支援的型別消除，再消除目標平台不支援的運算。
      * [[http://llvm.org/docs/CodeGenerator.html#selectiondag-legalizetypes-phase|SelectionDAG LegalizeTypes Phase]]
      * [[http://llvm.org/docs/CodeGenerator.html#selectiondag-legalize-phase|SelectionDAG Legalize Phase]]
    * 對 SelectionDAG 進行變換的過程中，需要 DAG Combiner 對變換後的 SelectionDAG 做優化和清理。
      * [[http://llvm.org/docs/CodeGenerator.html#selectiondag-optimization-phase-the-dag-combiner|SelectionDAG Optimization Phase: the DAG Combiner]]
  - DAG (平台無關) -> DAG (平台相關): 做 instruction selection。透過 [[http://llvm.org/doxygen/classllvm_1_1SelectionDAGISel.html|SelectionDAGISel]] 遍歷 DAG 中的 SDNode，將平台無關指令替換成平台相關指令。注意! 此階段所得的目標代碼仍使用 virtual register，留待之後 register allocation 配置暫存器。
    * [[http://llvm.org/docs/CodeGenerator.html#selectiondag-select-phase|SelectionDAG Select Phase]]。
  - DAG (平台相關) -> DAG (平台相關): 做 instruction scheduling。透過 [[http://llvm.org/doxygen/ScheduleDAG_8h_source.html|ScheduleDAG]]對 DAG 進行 scheduling，這裡會根據目標平台的一些限制指定節點之間的順序。
    * [[http://llvm.org/docs/CodeGenerator.html#selectiondag-scheduling-and-formation-phase|SelectionDAG Scheduling and Formation Phase]]
  - DAG (平台相關) -> MI: 透過 [[http://llvm.org/doxygen/InstrEmitter_8h_source.html|InstrEmitter]] 遍歷 DAG 生成 MI ([[http://llvm.org/doxygen/MachineInstr_8h_source.html|MachineInstr]])。此時的 MI 近似於 LLVM IR 的 SSA form (擁有 Phi Node)，可以對其做 SSA 相關優化; MI 此時使用的仍是 virtual register。經過 register allocation 之後，MI 不再是 SSA form，同時也改使用 physical register。
  - MI -> MC: 透過 [[http://llvm.org/doxygen/AsmPrinter_8h_source.html|AsmPrinter]] 遍歷 MI，轉換成 [[http://llvm.org/doxygen/MCInst_8h_source.html|MCInst]]，最後輸出匯編或是目的檔。

  * 延伸閱讀:
    * Register Allocation
      * [[http://llvm.org/devmtg/2008-08/Cheng_RegisterAllocation.pdf|LLVM Register Allocation (2008)]]
      * [[http://llvm.org/devmtg/2009-10/RegisterAllocationFutureWorks.pdf|Future Works in LLVM Register Allocation (2009)]]
      * [[http://llvm.org/devmtg/2011-11/#talk6|Register Allocation in LLVM 3.0 (2011)]]
      * [[https://www.slideshare.net/chimerawang/llvm-register-allocation-2nd-version|LLVM Register Allocation (2nd Version)]]
      * Phi-Elimination -> Register Coalescing -> Register Allocation -> Virtual Register Rewriter。<code bash>
$ llc -debug-pass=Structure sum.ll
      Eliminate PHI nodes for register allocation
      Simple Register Coalescing
      Greedy Register Allocator
      Virtual Register Rewriter
</code>
        * [[http://llvm.org/doxygen/PHIElimination_8cpp_source.html|PHIElimination.cpp]]: ''PHIElimination::LowerPHINode()'' 改用 COPY 指令取代 PHI 指令。
        * [[http://llvm.org/doxygen/RegisterCoalescer_8cpp_source.html|RegisterCoalescer.cpp]]: ''RegisterCoalescer::joinCopy()'' 如果 COPY 指令的來源和目的暫存器其 interval 相同，消除該 COPY 指令。
        * [[http://llvm.org/doxygen/VirtRegMap_8cpp_source.html|VirtRegMap.cpp]]: 暫存器分配的結果存在 ''VirtRegMap''，紀錄虛擬暫存器和物理暫存器的對應，''VirtRegRewriter::runOnMachineFunction()'' 再根據 ''VirtRegMap'' 將 MI 中的虛擬暫存器改寫成物理暫存器。
      * 底下範例可以看到 PHI 消除前後，COPY 被置換成目標平台指令和虛擬暫存器轉為物理暫存器的過程。<code bash>
$ cat phi.c
void func(bool r, bool y) {
  bool l = y || r;
}
$ clang++ -O0 phi.c -mllvm -print-machineinstrs -c
# After Instruction Selection:
# Machine code for function func: IsSSA, TracksLiveness

BB#2: derived from LLVM BB %lor.end
    Predecessors according to CFG: BB#0 BB#1
        %vreg1<def> = PHI %vreg6, <BB#0>, %vreg13, <BB#1>; GR8:%vreg1,%vreg6,%vreg13
        %vreg15<def,tied1> = AND8ri %vreg1<tied0>, 1, %EFLAGS<imp-def>; GR8:%vreg15,%vreg1
        MOV8mr <fi#2>, 1, %noreg, 0, %noreg, %vreg15<kill>; mem:ST1[%l] GR8:%vreg15
        RETQ
    
# After Eliminate PHI nodes for register allocation:
# Machine code for function func: NoPHIs, TracksLiveness

BB#2: derived from LLVM BB %lor.end
    Predecessors according to CFG: BB#0 BB#1
        %vreg1<def> = COPY %vreg16; GR8:%vreg1,%vreg16
        %vreg15<def,tied1> = AND8ri %vreg1<tied0>, 1, %EFLAGS<imp-def>; GR8:%vreg15,%vreg1
        MOV8mr <fi#2>, 1, %noreg, 0, %noreg, %vreg15<kill>; mem:ST1[%l] GR8:%vreg15
        RETQ
        
# After Fast Register Allocator:
# Machine code for function func: NoPHIs, TracksLiveness, NoVRegs

BB#2: derived from LLVM BB %lor.end
    Predecessors according to CFG: BB#0 BB#1
        %AL<def> = MOV8rm <fi#3>, 1, %noreg, 0, %noreg; mem:LD1[FixedStack3]
        %AL<def,tied1> = AND8ri %AL<tied0>, 1, %EFLAGS<imp-def>
        MOV8mr <fi#2>, 1, %noreg, 0, %noreg, %AL<kill>; mem:ST1[%l]
        RETQ
</code>
    * MC
      * [[http://llvm.org/devmtg/2010-11/Dunbar-MC.pdf|The LLVM Assembler & Machine Code Infrastructure]]
      * [[http://llvm.org/devmtg/2011-11/Grosbach_Anderson_LLVMMC.pdf|LLVM MC In Practice]]
===== SelectionDAG =====
<note important>
  * [[http://llvm.org/docs/GlobalISel.html|GlobalISel]] 預期將會取代 SelectionDAG。
    * LLVM IR -> Global MI (GMI) -> MI
    * [[https://www.youtube.com/watch?v=6tfb344A7w8|2016 LLVM Developers’ Meeting: Global Instruction Selection Status]]
</note>
<note>
  * [[http://llvm.org/docs/CodeGenerator.html#target-independent-code-generation-algorithms|Target-independent code generation algorithms]]
    * [[http://llvm.org/devmtg/2007-05/04-Cheng-Codegen.pdf|LLVM Code Generator]]
    * [[http://llvm.org/devmtg/2008-08/Gohman_CodeGenAndSelectionDAGs.pdf|CodeGen Overview and Focus on SelectionDAGs]]
    * [[http://llvm.org/devmtg/2009-10/Korobeynikov_BackendTutorial.pdf|Tutorial: Building a backend in 24 hours]] 
  * [[http://llvm.org/docs/WritingAnLLVMBackend.html#instruction-selector|Instruction Selector]]
  * 下載最新版 [[http://www.graphviz.org/Download_macos.php|Graphviz]]，<color #ed1c24>按照指示</color>安裝 Graphviz 和相關軟體。(目前失效)
    * [[http://stackoverflow.com/questions/43372723/how-to-open-dot-on-mac|How to open dot on Mac]]
  * [[http://lists.llvm.org/pipermail/llvm-dev/2017-October/118393.html|[llvm-dev] LLVM ERROR: Cannot select: t29: v32f64 = X86ISD::VBROADCAST t9]]
</note>
  * 術語:
    * [[http://llvm.org/doxygen/ValueTypes_8h_source.html|EVT (Extended Value Type)]]: 包含 MVT 的超集，可以表示目標平台不支持的型別。 
    * [[http://llvm.org/doxygen/MachineValueType_8h_source.html|MVT (Machine Value Type)]]: 任一 LLVM 目標平台支持的型別皆屬於此類。
    * [[http://llvm.org/doxygen/ISDOpcodes_8h_source.html|ISD]]: 可以指目標平台無關或相關的 SDNode。
    * [[http://llvm.org/doxygen/classllvm_1_1SelectionDAG.html|DAG (SelectionDAG)]]
    * [[http://llvm.org/doxygen/classllvm_1_1SDValue.html|Chain/Op (SDValue)]]: SDNode 可能會產生多個 SDValue，''SDValue.getNode()'' 和 ''SDValue.getResNo()'' 分別代表該 SDValue 是從哪個 SDNode 生成，以及是該 SDNode 第幾個輸出。
    * [[http://llvm.org/doxygen/classllvm_1_1TargetLowering.html|TLI (TargetLowering)]]: 後端負責 lowering SDNode。

  * 重要檔案與常用接口:
    * [[http://llvm.org/doxygen/SelectionDAG_8h_source.html|SelectionDAG.h]]: SelectionDAG 類。
    * [[http://llvm.org/doxygen/ValueTypes_8h_source.html|ValueTypes.h]]: SDNode result 型別。
    * [[http://llvm.org/doxygen/ISDOpcodes_8h_source.html|ISDOpcodes.h]]: 目標平台無關的 Instruction Selection Dag 節點。目標平台可以在 TargetLowering 定義自己的 ISD 節點，如: [[http://llvm.org/doxygen/X86ISelLowering_8h_source.html|X86ISelLowering.h]]。
    * [[https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Target/TargetSelectionDAG.td|TargetSelectionDAG.td]]: SDNode 和 *.td 檔中 pattern 的對映。
    * [[http://llvm.org/doxygen/classllvm_1_1SelectionDAG.html#abad5b17501ce8972fb04a149611e7177|SelectionDAG::getNode()]]: 生成目標平台無關或相關的 ISD 節點。常見於目標平台的 ISelLowering，如: [[http://llvm.org/doxygen/X86ISelLowering_8cpp_source.html|X86ISelLowering.cpp]]。
    * [[http://llvm.org/doxygen/classllvm_1_1SelectionDAG.html#a1c04c72abd24de2572a03ef686a36dd6|SelectionDAG::getMachineNode()]] 建立 [[http://llvm.org/doxygen/classllvm_1_1MachineSDNode.html|MachineSDNode]] 節點，透過 [[http://llvm.org/doxygen/SelectionDAGISel_8h_source.html|SelectionDAG::ReplaceNode]] 替換目標平台無關的節點。常見於目標平台的 ISelDAGToDAG，如: [[http://llvm.org/doxygen/X86ISelDAGToDAG_8cpp_source.html|X86ISelDAGToDAG.cpp]]。
      * 注意填入 operand 的順序，chain 也不能遺漏。後者在 O2 會爆發出難以查找的錯誤。[[http://llvm.org/doxygen/ISDOpcodes_8h_source.html|ISD]] 中的 INTRINSIC_WO_CHAIN，INTRINSIC_W_CHAIN 和 INTRINSIC_VOID 的註解說明第幾個 operand 代表 chain，第幾個 operand 代表真正的 operand。

  * [[http://llvm.org/docs/CodeGenerator.html#introduction-to-selectiondags|Introduction to SelectionDAGs]]
    * 一個 DAG 基本對應一個 BasicBlock，DAG 中的每一個節點 (SDNode) 基本對應一條 LLVM IR。[[https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Target/TargetSelectionDAG.td|TargetSelectionDAG.td]] 定義了 LLVM IR 對應的 SDNode。SDNode 之間是 producer-consumer 的關係，producer 產生 SDValue 給 consumer。SDValue 有底下三種型別:
      * concrete value type: 對映黑色實線。
      * Other: 對映藍色虛線。
      * Glue: 對映紅色實線。

{{http://eli.thegreenplace.net/images/2013/dag_imul5.png?400}}

SDNode 之間有 data 或 control (chain, 簡寫 ch) dependency。上圖中黑色箭頭代表 data dependency，藍色箭頭代表 control dependency (注意它指向節點中的 ch 欄位)，紅色箭頭代表兩個節點之間是 glue 的關係 (和 ch 相比，glue 之間不能插入其它節點)。EntryToken 和 TokenFactor 負責 control dependency。GraphRoot 代表 SelectionDAG 的最底層。每個節點中間一欄代表其 operator ([[http://llvm.org/doxygen/namespacellvm_1_1ISD.html#a22ea9cec080dd5f4f47ba234c2f59110|ISD::NodeType]])，上面一欄代表其輸入，下面一欄代表其輸出。

SelectionDAG 主要有底下幾個流程 ([[http://llvm.org/doxygen/SelectionDAGISel_8cpp_source.html|SelectionDAGISel::CodeGenAndEmitDAG()]]):
  - Lower: 將 LLVM IR 轉換成 SelectionDAG (SelectionDAGISel::SelectAllBasicBlocks)。<code bash>
# Pop up a window to show dags before the first dag combine pass
$ llc -view-dag-combine1-dags sum.ll
</code>
  - Combine: 將若干 SDNode 合成一個 SDNode ([[http://llvm.org/doxygen/classllvm_1_1SelectionDAG.html#ab3154fceed4e2389901aee563beae4e8|SelectionDAG::Combine()]])。<code bash>
# Pop up a window to show dags before legalize types
$ llc -view-legalize-types-dags sum.ll
</code>
  - Legalize Type: 將目標平台不支援的型別轉換成支援的型別 ([[http://llvm.org/doxygen/classllvm_1_1SelectionDAG.html#affe59e12b70bcd9f5476f7ab289fa84a|SelectionDAG::LegalizeTypes()]])。<code bash>
# Pop up a window to show dags before the post legalize types dag combine pass
$ llc -view-dag-combine-lt-dags sum.ll
</code>
  - Legalize Vector: 將目標平台不支援的向量型別轉換成支援的型別。<code bash>
# Pop up a window to show dags before legalize
$ llc -view-legalize-dags sum.ll
</code>
  - Legalize Op: 將目標平台不支援的運算轉換成支援的運算。<code bash>
# Pop up a window to show dags before the second dag combine pass
$ llc -view-dag-combine2-dags sum.ll
</code>
  - Combine: 將若干 SDNode 合成一個 SDNode。<code bash>
# Pop up a window to show isel dags as they are selected
$ llc -view-isel-dags sum.ll
</code>
  - Select: SDNode (平台無關) -> SDNode (平台相關)。<code bash>
# Pop up a window to show sched dags as they are processed
$ llc -view-sched-dags sum.ll
</code>
  - Schedule: 完成指令調度。<code bash>
# Pop up a window to show SUnit dags after they are processed
$ llc -view-sunit-dags sum.ll
</code>

  * [[http://llvm.org/doxygen/X86ISelLowering_8cpp_source.html|X86ISelLowering.cpp]]
    * 繼承 [[http://llvm.org/doxygen/classllvm_1_1TargetLowering.html|TargetLowering]]，負責從 LLVM IR 轉成平台可以支持的 SDNode。
      * [[http://llvm.org/doxygen/classllvm_1_1TargetLowering.html|TargetLowering]] 繼承 [[http://llvm.org/doxygen/classllvm_1_1TargetLoweringBase.html|TargetLoweringBase]]。TargetLoweringBase 提供 setAction 函式，指示針對特定型別的特定操作，後端如何處理。例如: [[http://llvm.org/doxygen/classllvm_1_1TargetLoweringBase.html#a07c0c67913bb543fc45ce9ef65ef260a|setOperationAction]] 設為 Custom 的需要在 [[http://llvm.org/doxygen/classllvm_1_1X86TargetLowering.html#aef185cdc1d7af43f6a97bf868da7d422|LowerOperation]] 處理。
        * [[http://llvm.org/doxygen/classllvm_1_1TargetLowering.html#a48b383731144a31a360c35ab2860b696|LowerFormalArguments]]
        * [[http://llvm.org/doxygen/classllvm_1_1TargetLowering.html#ad927e7d64ce33cd1bba91fefe583bc85|LowerReturn]]
      * [[http://llvm.org/doxygen/classllvm_1_1TargetLoweringBase.html#a07debef9c174231d513e1966ef50cd0a|setTargetDAGCombine]]: 在 TargetLowering 註冊的 ISD，會經過 [[http://llvm.org/doxygen/classllvm_1_1TargetLowering.html#a23088f6e5065a437e6448022592ad85c|PerformDAGCombine]] 調用對應的函式，執行平台自定義的 Combine。
    * [[http://lists.llvm.org/pipermail/llvm-dev/2015-July/088852.html|[LLVMdev] PerformDAGCombine vs. DAG to DAG]]
      * 如果 DAG 變換可以暴露其它 DAG 變換 (優化)，則將該變換寫在 DAGCombine，否則寫在 DAG to DAG。
    * [[http://lists.llvm.org/pipermail/llvm-dev/2016-October/106019.html|[llvm-dev] Target DAG-combine vs. Target custom lowering]]
      * Lowering 算是 legalization，只做一次; DAGCombine 算是優化，可做多次，在 legalization 之前或之後做。
    * O2 情況下，後端需要處理 VECTOR 相關的 SDNode。例如: [[http://llvm.org/doxygen/X86ISelLowering_8cpp_source.html|LowerBUILD_VECTOR]][(http://lists.llvm.org/pipermail/llvm-dev/2017-May/112745.html)]
      * 某些情況下，針對目標平台不支持的 SDNode，可以在 [[http://llvm.org/doxygen/classllvm_1_1X86TargetLowering.html#a8330b55827791d73b9b061e660981b0c|PerformDAGCombine]] 將其轉換成目標平台支援的 SDNode。

  * [[http://llvm.org/doxygen/X86ISelDAGToDAG_8cpp_source.html|X86ISelDAGToDAG.cpp]]
    * 負責將平台無關的 SDNode 換成平台相關的 SDNode。一般較為簡單的替換會寫成 *.td 檔 ([[http://llvm.org/docs/CodeGenerator.html#selectiondag-select-phase|SelectionDAG Select Phase]]，善用 [[https://llvm.org/svn/llvm-project/llvm/trunk/include/llvm/Target/TargetSelectionDAG.td|TargetSelectionDAG.td]] 定義的 SDNode 寫匹配規則)，複雜的替換在此檔寫成 C++ 函式。
    * [[http://llvm.org/doxygen/X86ISelDAGToDAG_8cpp_source.html|X86DAGToDAGISel::Select()]] 是入口點。
      * 預設經由 ''SelectCode(Node)'' 透過 *.td 檔所寫的匹配規則選擇指令，或者也可以針對特定 SDNode 生成其它 SDNode 進行置換。函式 ''SelectCode(Node)'' 由 tblgen 產生。

  * [[http://llvm.org/doxygen/classllvm_1_1TargetFrameLowering.html#a8404705eb7a27e437ac51ca3730bfd7c|TargetFrameLowering::hasReservedCallFrame]]
    * 如果被調用的函式不需要 frame pointer (即 ![[http://llvm.org/doxygen/classllvm_1_1TargetFrameLowering.html#a524e1c2b3df4633c057ba8a4f6e2a3c4|TargetFrameLowering::hasFP]])，其參數所需要的棧空間會被配置在當前函式的棧上。
      * [[https://www.quora.com/What-is-the-difference-between-a-stack-pointer-and-a-frame-pointer|What is the difference between a stack pointer and a frame pointer?]]
        * stack pointer 指向棧頂，該處為下一個可以使用的棧空間。當資料寫入 stack pointer 所指位址時，stack pointer 移動到下一個可以使用的位址。frame pointer 一般指向當前調用棧的底部 (或是某一固定點)，因為 frame pointer 不會移動，所以可以作為參考點，存取當前棧上的變量。

  * [[http://llvm.org/doxygen/classllvm_1_1TargetInstrInfo.html#a1f16d509bf2e2c2d0e0176f04c253a96|loadRegFromStackSlot]] /[[http://llvm.org/doxygen/classllvm_1_1TargetInstrInfo.html#a94d273699f7fdbd1a01477c4ca8014dd|storeRegToStackSlot]]
    * O2 暫存器分配會透過上述兩個函式做 spill。此處需注意，必須生成正確的指令，否則 O2 會有 runtime failure。

  * [[https://stackoverflow.com/questions/31426486/how-to-make-llvm-prefer-one-machine-instruction-over-another|How to make LLVM prefer one machine instruction over another?]]
    * [[http://lists.llvm.org/pipermail/llvm-dev/2013-February/059236.html|[LLVMdev] pattern matching order]]
    * 使用 ''AddedComplexity'' 指定指令選擇的偏好。''AddedComplexity'' 可以為正或負數，數值越大，代表我們越傾向使用該指令。
    * 能匹配到越多 SDNode (即越 complex) 的指令，其被選擇的優先級越高。
===== Global Instruction Selection =====
<note>
  * [[http://llvm.org/docs/GlobalISel.html|GlobalISel]]
    * [[https://www.youtube.com/watch?v=McByO0QgqCY|GlobalISel: Past, Present, and Future (2017)]]
    * [[https://www.youtube.com/watch?v=6tfb344A7w8|Global Instruction Selection Status (2016)]]
    * [[https://www.youtube.com/watch?v=F6GGbYtae3g|A Proposal for Global Instruction Selection (2015)]] ([[http://llvm.org/devmtg/2015-10/slides/Colombet-GlobalInstructionSelection.pdf|Slide]])
</note>

  LLVM IR -> Global MI (GMI) -> MI

    * LLVM IR -> Global MI (GMI): [[http://llvm.org/doxygen/IRTranslator_8cpp_source.html|IRTranslator]] 負責將 LLVM IR 轉成 GMI，一條 LLVM IR 對映到零到多條 GMI。
      * [[http://llvm.org/docs/GlobalISel.html#generic-machine-ir|Generic Machine IR]]
    * GMI -> MI: GMI 會經過 legalize 和 register bank select，再做 instruction selection 得到 MI。
===== Instruction Scheduling =====
<note>
  * [[https://www.youtube.com/watch?v=brpomKUynEA|Writing Great Machine Schedulers (2017)]]
    * 介紹 LLVM 以及 TableGen 如何客製化指令調度。
  * [[https://www.youtube.com/watch?v=J0AtVzvPp_Y&feature=youtu.be|Scheduler for in-order processors (2016)]]([[http://llvm.org/devmtg/2016-09/slides/Absar-SchedulingInOrder.pdf|Slide]])
    * 主要介紹 TableGen 指令調度相關的部分。
  * [[http://llvm.org/devmtg/2014-10/Slides/Estes-MISchedulerTutorial.pdf|SchedMachineModel: Adding and Optimizing a Subtarget (2014)]]
    * 系統性的介紹 LLVM 以及 TableGen 指令調度相關的部分。
  * [[http://llvm.org/devmtg/2012-11/Larin-Trick-Scheduling.pdf|Instruction scheduling for Superscalar and VLIW platforms Temporal perspective (2012)]]
    * 早期介紹 MI scheduler 和 TableGen 關於指令調度的部分。。
  * [[https://engineering.purdue.edu/~milind/ece468/2015fall/lecture-08.pdf|Instruction scheduling]]
</note>
  * [[http://llvm.org/docs/CodeGenerator.html#selectiondag-scheduling-and-formation-phase|SelectionDAG Scheduling and Formation Phase]]
    * 使用 [[http://www.cs.cmu.edu/afs/cs/academic/class/15745-f03/public/lectures/L16_handouts.pdf|list scheduling]] ([[wp>Instruction scheduling]])。list scheduling 作用於一個 basic block。首先為該 basic block 建立一個 data dependency graph，根據 hardware resource 和 data dependence 為每個節點計算出 priority。從 cycle 0 開始，從 data dependency graph 挑出 priority 最高的節點，放進空閒的 function unit 執行。更新 data dependency graph，重新計算 priority，重複前一個步驟。
    * 前述方法為 top-down list scheduling。如果把 data dependency graph 上面的 edge 反向，則為 bottom-up list scheduling。top-down list scheduling 是看節點的 depth 是否小於當前 cycle; bottom-up list scheduling 則是看節點的 high 是否小於當前 cycle，來決定是否要挑選該節點。

  * 在 LLVM 中，instruction scheduler 共有三類。
    * 兩類是 pre-register allocation scheduler，一類是 post-register allocation scheduler。
    * 兩類中的第一個作用在 SelectionDAG，後兩者作用在 MI。
    * 目前作用在 SelectionDAG 的 pre-register allocation scheduler 基本只負責將 SelectionDAG 轉成 MI。真正做調度的是作用在 MI 的  scheduler。<code bash>
# 顯示 SDNode Sched DAG。
$ llc -view-sunit-dags sum.ll
# 顯示 MI Sched DAG, pre-RA 和 post-RA。
$ llc -view-misched-dags sum.ll
# pre-RA, SDNode
$ llc -pre-RA-sched
# pre-RA, MI
$ llc -enable-misched
# post-RA, MI
$ llc -enable-post-misched
</code>

  * 術語:
    * [[http://llvm.org/doxygen/classllvm_1_1SUnit.html|SUnit (Scheduling Unit)]]
      * 提供一層抽象，既可表示 SDNode，也可以表示 MachineInstr，以便能在 SelectionDAG 和 MI 做指令調度。
    * [[http://llvm.org/doxygen/classllvm_1_1SDep.html|SDep (Scheduling dependency)]]
      * 調度圖中，SUnit 是節點 (指令)，SDep 是節點之間的有向邊 (表示依賴關係)。SUnit 的 depth 代表從該節點向上，直到沒有 predecessor 的節點為止，最長的路徑; SUnit 的 high 代表從該節點向下，直到沒有 successor 的節點為止，最長的路徑。 
    * [[http://llvm.org/doxygen/MachineScheduler_8cpp_source.html|SchedRegion]]
      * basic block 中，不考慮被調度的指令被視為 scheduling boundary，將 basic block 切分成數個 region。調度時，針對 region 依序處理。
    * [[https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Target/TargetItinerary.td|Itinerary]]
      * 原意為旅程，描述指令執行會經過的 pipeline stage。
    * Bypass: 即 [[wp>Operand forwarding|data forwarding]] ([[https://stackoverflow.com/questions/40054344/pipelining-with-bypassing|Pipelining with bypassing]])。複習 [[wp>Classic RISC pipeline]]。Interlock 是處理 data hazard 的另一種方式，即在指令之間插入 nop。

  * TD 檔案:
    * [[https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Target/TargetSchedule.td|TargetSchedule.td]] 用 SchedMachineModel，SchedReadWrite 和 Itineraries 來描述調度訊息。
      * SchedMachineModel 定義最基本的調度屬性。<code>
class SchedMachineModel {
  int IssueWidth = -1;
  int MicroOpBufferSize = -1;
  int LoopMicroOpBufferSize = -1;

  int LoadLatency = -1;
  int HighLatency = -1;
  int MispredictPenalty = -1;

  ProcessorItineraries Itineraries = NoItineraries;
  bit PostRAScheduler = 0;
  bit CompleteModel = 1;
  list<Predicate> UnsupportedFeatures = [];
  bit NoModel = 0;
}
</code>
        * 一般 subtarget 都會定義自己的 SchedMachineModel。[[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/Hexagon/HexagonScheduleV60.td|HexagonScheduleV60.td]]<code>
def HexagonModelV60 : SchedMachineModel {
  // Max issue per cycle == bundle width.
  let IssueWidth = 4;
  let Itineraries = HexagonItinerariesV60;
  let LoadLatency = 1;
  let CompleteModel = 0;
}
</code>
      * SchedReadWrite 可以針對指令的輸入輸出做更精細的設定。
        * [[https://youtu.be/brpomKUynEA?t=14m56s|Writing Great Machine Schedulers (2017)]]<code>
class SchedReadWrite;

class Sched<list<SchedReadWrite> schedrw> {
  list<SchedReadWrite> SchedRW = schedrw;
}

class SchedWrite : SchedReadWrite;

class SchedRead  : SchedReadWrite;

class ProcResource<int num> : ProcResourceKind,
  ProcResourceUnits<EponymousProcResourceKind, num>;
  
class WriteRes<SchedWrite write, list<ProcResourceKind> resources>
  : ProcWriteResources<resources> {
  SchedWrite WriteType = write;
}

class ReadAdvance<SchedRead read, int cycles, list<SchedWrite> writes = []>
  : ProcReadAdvance<cycles, writes> {
  SchedRead ReadType = read;
}
</code>
        * 針對目標平台的 FuncUnit，定義 SchedRead/SchedWrite 和 ProcResource。再將指令操作數和對應的 SchedRead/SchedWrite 綁定。針對 SchedRead 設定 ReadAdvance; 針對 SchedWrite 設定 WriteRes。[[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/ARM/ARMSchedule.td|ARMSchedule.td]] 和 [[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/ARM/ARMScheduleR52.td|ARMScheduleR52.td]]<code>
// Basic ALU operation.
def WriteALU : SchedWrite;
def ReadALU : SchedRead;

def R52UnitALU    : ProcResource<2> { let BufferSize = 0; } // Int ALU

def : WriteRes<WriteALU, [R52UnitALU]> { let Latency = 3; }

def : ReadAdvance<ReadALU, 1>;
</code>
        * 針對 subtarget 微調。SchedWriteRes  SchedReadAdvance 搭配 InstRW 和 instregex 使用。
      * [[https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Target/TargetItinerary.td|TargetItinerary.td]] 定義指令調度所需要的資訊。FuncUnit 描述功能單元。Bypass 描述可以將運算結果 forward 的通路。InstrItinData 針對屬於同一個 InstrItinClass 的指令，描述其 InstrStage (InstrStage 描述指令完成該 stage 所需的 FuncUnit 和 cycle 數)，其運算元於第幾個 cycle 準備好，以及 Bypass 訊息。<code>
class ProcessorItineraries<list<FuncUnit> fu, list<Bypass> bp,
                           list<InstrItinData> iid> {
  list<FuncUnit> FU = fu;
  list<Bypass> BP = bp;
  list<InstrItinData> IID = iid;
}
</code>  
        * [[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/Hexagon/HexagonSchedule.td|HexagonSchedule.td]] 定義基本的FuncUnit，Bypass 和 InstrItinClass。不同系列的晶片可以定義自己的指令調度資訊，例如: [[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/Hexagon/HexagonScheduleV4.td|HexagonScheduleV4.td]] 定義 ProcessorItineraries 和 SchedMachineModel，其中 SchedMachineModel 會使用到 ProcessorItineraries。

  * 其它檔案:
    * [[http://llvm.org/doxygen/ScheduleDAG_8h.html|ScheduleDAG.h]]
      * [[http://llvm.org/doxygen/classllvm_1_1ScheduleDAGSDNodes.html|ScheduleDAGSDNodes]] 作用於 SDNode，為 pre-register allocation DAG scheduler 提供接口，''lib/CodeGen/SelectionDAG/'' 底下有幾個實現。[[http://llvm.org/doxygen/classllvm_1_1ScheduleDAGInstrs.html|ScheduleDAGInstrs]] 作用於 MI，為 pre-register allocation/post-register MI scheduler 提供接口。''lib/Target/'' 底下各目標平台有各自的實現，均繼承 [[http://llvm.org/doxygen/classllvm_1_1ScheduleDAGMILive.html|ScheduleDAGMILive]] (ScheduleDAGMILive 考慮暫存器壓力。[(http://lists.llvm.org/pipermail/llvm-dev/2017-April/112287.html)])
      * {{http://llvm.org/doxygen/classllvm_1_1ScheduleDAG__inherit__graph.png?900}}
        * [[http://llvm.org/doxygen/classllvm_1_1DefaultVLIWScheduler.html|DefaultVLIWScheduler]] 只用在 [[http://llvm.org/doxygen/classllvm_1_1VLIWPacketizerList.html#acfb2315913d694fb3f1144279ab75a85|VLIWPacketizerList::PacketizeMIs]]。DefaultVLIWScheduler 僅是 [[http://llvm.org/doxygen/classllvm_1_1ScheduleDAGInstrs.html|ScheduleDAGInstrs]] 此一抽象類的實例化。
    * [[http://llvm.org/doxygen/MachineScheduler_8h_source.html|MachineScheduler.h]] 是 MI scheduler 重要檔案。
      * [[http://llvm.org/doxygen/classllvm_1_1ScheduleDAGMILive.html|ScheduleDAGMILive]]
        * 基本上 MI scheduler 都繼承於它。入口點是 ''schedule''，在 MachineScheduler/PostMachineScheduler/PostRASchedulerList 的 ''runOnMachineFunction'' 被調用。
      * [[http://llvm.org/doxygen/ScheduleDAGMutation_8h.html|ScheduleDAGMutation]]
        * 可以調整 DAG 節點之間的依賴關係。於 ''ScheduleDAGMI::postprocessDAG'' 在 ''ScheduleDAGMILive::schedule'' 被調用。
        * 參考 [[http://llvm.org/doxygen/structllvm_1_1HexagonSubtarget_1_1HVXMemLatencyMutation.html|HexagonSubtarget::HVXMemLatencyMutation]]。注意，調整 latency，必須同時調整 predecessor 和 successor 的 latency。改變 successor 的 latency，必須將 predecessor 的 height 設成 dirty; 反之，改變 predecessor 的 latency，必須將 successor 的 depth 設成 dirty。
      * [[http://llvm.org/doxygen/classllvm_1_1MachineSchedStrategy.html|MachineSchedStrategy]]
        * MachineSchedStrategy 可以作為參數傳遞給 ScheduleDAGMILive。其客製化的 ''pickNode'' 和 ''schedNode'' 在 ''ScheduleDAGMILive::schedule'' 被調用。。
        * [[http://llvm.org/doxygen/classllvm_1_1GenericScheduler.html|GenericScheduler]] 為 MachineSchedStrategy 預設實現。{{http://llvm.org/doxygen/classllvm_1_1MachineSchedStrategy__inherit__graph.png?900}}
    * [[http://llvm.org/doxygen/MachineScheduler_8cpp_source.html|MachineScheduler.cpp]] 和 [[http://llvm.org/doxygen/PostRASchedulerList_8cpp_source.html|PostRASchedulerList]] 是 MI scheduler pass 重要檔案。MI scheduler pass 在 ''runOnMachineFunction'' 中，會調用到 MI scheduler 的 ''schedule'' 進行調度。
      * MachineScheduler 是 pre-register allocation scheduler pass。
      * PostMachineScheduler 是 post-register allocation scheduler pass。
      * PostRASchedulerList 是 post-register allocation scheduler pass。預設執行這一個，而不是前面的 PostMachineScheduler。 

  * ScheduleDAGMI 和 ScheduleDAGMILive 主要差異請看各自的 ''schedule()'' 實現 ([[http://llvm.org/doxygen/classllvm_1_1ScheduleDAGMI.html#a5d7e71cf32573e6b1762b5a1d82a1cf5|ScheduleDAGMI::schedule()]][[http://llvm.org/doxygen/classllvm_1_1ScheduleDAGMILive.html#a1583ce23a69e8a1b4af8065e2019c75f|ScheduleDAGMILive::schedule()]])。
    * ScheduleDAGMILive 基本上是在 ScheduleDAGMI 的基礎上，增加暫存器壓力的紀錄。<code c++>
  buildSchedGraph(AA);

  findRootsAndBiasEdges(TopRoots, BotRoots);

  SchedImpl->initialize(this); // SchedImpl 為 MachineSchedStrategy 的實例。
  
  initQueues(TopRoots, BotRoots);

  bool IsTopNode = false;
  while (true) {
    // 如果 SU 為 nullptr，或為非法，跳出循環。
    SUnit *SU = SchedImpl->pickNode(IsTopNode);

    // ScheduleDAGMILive 調用 scheduleMI，其基礎為原來 ScheduleDAGMI 的代碼。
    // 此處透過 moveInstruction 調整 SU 對應 MI 在原來 basic block 的位置。
    
    SchedImpl->schedNode(SU, IsTopNode);

    updateQueues(SU, IsTopNode);
  }
</code>
      * [[http://llvm.org/doxygen/classllvm_1_1ScheduleDAGInstrs.html#a43d828dab9502a17822c16d108cb0b9a|ScheduleDAGInstrs::buildSchedGraph()]]: 為 basic block 中 region 的 MachineInstr 建立 SUnit，並藉由 MachineInstr 引用的暫存器建立指令之間的依賴關係 (記錄在[[http://llvm.org/doxygen/classllvm_1_1SUnit.html|SUnit]] 中的 Preds 和 Succs)。  [[http://llvm.org/doxygen/classllvm_1_1ScheduleDAGMILive.html#a4be5ffcd4f76d433cc753be146a872b7|ScheduleDAGMILive::buildDAGWithRegPressure()]] 在前者的基礎上增加暫存器壓力的計算。
      * [[http://llvm.org/doxygen/classllvm_1_1ScheduleDAGMI.html#a6d0a0b4903e8d4d12c98b0f43fe83878|ScheduleDAGMI::findRootsAndBiasEdges()]]: Data dependency graph 的 top 和 bottom root 節點因為沒有依賴其它節點，可以排入 schedule。
      * [[http://llvm.org/doxygen/classllvm_1_1ScheduleDAGMI.html#a1f98694f104d052d71ed74ade38d69f0|ScheduleDAGMI::initQueues()]]: 將 top 和 bottom root 節點排入 SchedBoundary 所維護的 [[http://llvm.org/doxygen/classllvm_1_1ReadyQueue.html|ReadyQueue]]。
      * [[http://llvm.org/doxygen/classllvm_1_1ScheduleDAGMI.html#abfdd9a95217810c69b2557060a130318|ScheduleDAGMI::updateQueues()]]: 前一條指令被調度過後，重新計算各條指令的優先級，並更新 [[http://llvm.org/doxygen/classllvm_1_1ReadyQueue.html|ReadyQueue]]。
        * 如果是 top-down list scheduling，調用[[http://llvm.org/doxygen/classllvm_1_1ScheduleDAGMI.html#acf56d667066b39177a3c0f134d759d98|ScheduleDAGMI::releaseSuccessors()]] 釋放已被調度指令的 successor，使其可以被調度。
      * 可以看到主要透過 SchedImpl (實現 MachineSchedStrategy 的 GenericScheduler) 的 pickNode 和 schedNode 做指令調度。以 [[http://llvm.org/doxygen/classllvm_1_1GenericScheduler.html#ad4cc558b6cbcc4e9cea5df915c197e14|GenericScheduler::pickNode()]] 和 [[http://llvm.org/doxygen/classllvm_1_1GenericScheduler.html#adf574ab3455d7292bed998d8ba50bfeb|GenericScheduler::schedNode()]] 為例。
        *  GenericScheduler 裡，top-down 和 bottom-up 分別對應到 [[http://llvm.org/doxygen/classllvm_1_1SchedBoundary.html|SchedBoundary]] Top 和 Bot。每個 SchedBoundary 各自有 [[http://llvm.org/doxygen/classllvm_1_1ReadyQueue.html|ReadyQueue]] (Available 和 Pending。從 Available 取節點，因為 interlock 的節點會被放到 Pending，之後再轉移到 Available)。
          * [[http://llvm.org/doxygen/classllvm_1_1SchedBoundary.html|SchedBoundary]]: 對於欲做指令調度的 MI DAG 而言，我們基本會維護 top-down 和 bottom-up 兩個方向的 SchedBoundary，記錄可以被調度的節點和其它相關訊息。
          * [[http://llvm.org/doxygen/structllvm_1_1GenericSchedulerBase_1_1SchedCandidate.html|SchedCandidate]]: 對 SchedBoundary 的 [[http://llvm.org/doxygen/classllvm_1_1ReadyQueue.html|ReadyQueue]] 中的可供調度的節點，我們還要經過一些算法評估 ([[http://llvm.org/doxygen/classllvm_1_1GenericScheduler.html#a2394f6619e48d28d58883bc91a8ac919|GenericScheduler::tryCandidate()]])，從中挑選出合適的候選者 (SchedCandidate)。
        * pickNode 以命令行參數為優先，調用 [[http://llvm.org/doxygen/classllvm_1_1GenericScheduler.html#ad9779ec3011b4547f9a509af82d87a6e|pickNodeFromQueue]] 以 top-down 或是 bottom-up 方式取得下一個欲調度的節點。如果沒有命令行參數，調用 [[http://llvm.org/doxygen/classllvm_1_1GenericScheduler.html#aef82bce77971408815c3b90979188d1e|pickNodeBidirectional]] 根據 IsTopNode 以 top-down 或是 bottom-up 方式取得下一個欲調度的節點。
        * schedNode 根據 pickNode 所挑出的節點，更新擴展 top-down 或 bottom-up 的 SchedBoundary ([[http://llvm.org/doxygen/classllvm_1_1SchedBoundary.html#a6b1771cf492495f8f82727657c68e571|SchedBoundary::bumpNode]])，和其它相關資訊。

  * Hexagon 使用到 VILW 相關的指令調度。
    * [[http://llvm.org/docs/CodeGenerator.html#vliw-packetizer|VLIW Packetizer]]
    * 術語:
      * Packet: 即 [[wp>Explicitly parallel instruction computing|VLIW]] 中的 bundle。
      * RPTracker: [[http://llvm.org/doxygen/classllvm_1_1RegPressureTracker.html|RegPressureTracker]]。用來紀錄當前 virtual register pressure，避免指令調度造成後續暫存器分配產生 spill。
        * <q>RPTracker will cover the entire DAG region, TopTracker and BottomTracker will be initialized to the top and bottom of the DAG region without covereing any unscheduled instruction.</q>
    * PreRA scheduler
      * [[http://llvm.org/doxygen/HexagonMachineScheduler_8h_source.html|HexagonMachineScheduler.h]]
        * [[http://llvm.org/doxygen/classllvm_1_1VLIWResourceModel.html|VLIWResourceModel]]
          * 包裝 DFAPacketizer (ResourcesModel) 和 TargetSchedModel (SchedModel)。主要用來看 SU 是否能放在當前的 Packet (''VLIWResourceModel::isResourceAvailable()'')。
        * [[http://llvm.org/doxygen/classllvm_1_1VLIWMachineScheduler.html|VLIWMachineScheduler]]
          * VLIWMachineScheduler 繼承 ScheduleDAGMILive，ScheduleDAGMILive 做 instruction scheduling 的同時，也考慮 register pressure。因此可以知道 VLIWMachineScheduler 在 register allocation 之前執行。VLIWMachineScheduler 基本只是 override ScheduleDAGMILive 的 ''schedule()'' 接口，供上層回調。
        * [[http://llvm.org/doxygen/classllvm_1_1ConvergingVLIWScheduler.html|ConvergingVLIWScheduler]]
          * ConvergingVLIWScheduler 繼承 MachineSchedStrategy，主要 override MachineSchedStrategy 的 pickNode 和 schedNode。
      * [[http://llvm.org/doxygen/HexagonMachineScheduler_8cpp_source.html|HexagonMachineScheduler.cpp]]
        * 入口點位於 ''schedule()''。實際挑選節點，並更新 data dependency graph 各節點 priority 的工作交給 SchedImpl (即 ''ConvergingVLIWScheduler'')。
      * ''HexagonPassConfig'' 複寫 ''createMachineScheduler''，回傳 VLIWMachineScheduler。''createMachineScheduler'' 在 ''MachineScheduler::runOnMachineFunction'' 被調用。透過這樣的方式，Hexagon 執行自己的 PreRA scheduler。
    * PostRA scheduler
      * 採用 [[http://llvm.org/doxygen/PostRASchedulerList_8cpp_source.html|PostRASchedulerList]]。Hexagon 分別透過 ''getHazardType'' 和 ''EmitInstruction'' 改變調度結果。<code c++>
ScheduleHazardRecognizer::HazardType
HexagonHazardRecognizer::getHazardType(SUnit *SU, int stalls) {
  if (!Resources->canReserveResources(*MI)) {
    DEBUG(dbgs() << "*** Hazard in cycle " << PacketNum << ", " << *MI);
    HazardType RetVal = Hazard;
    if (TII->mayBeNewStore(*MI)) {
      // Make sure the register to be stored is defined by an instruction in the
      // packet.
      MachineOperand &MO = MI->getOperand(MI->getNumOperands() - 1);
      if (!MO.isReg() || RegDefs.count(MO.getReg()) == 0)
        return Hazard;
      // The .new store version uses different resources so check if it
      // causes a hazard.
      MachineFunction *MF = MI->getParent()->getParent();
      MachineInstr *NewMI =
        MF->CreateMachineInstr(TII->get(TII->getDotNewOp(*MI)),
                               MI->getDebugLoc());
      if (Resources->canReserveResources(*NewMI))
        RetVal = NoHazard;
      DEBUG(dbgs() << "*** Try .new version? " << (RetVal == NoHazard) << "\n");
      MF->DeleteMachineInstr(NewMI);
    }
    return RetVal;
  }
}
</code>
    * Packetizer
      * [[http://llvm.org/doxygen/DFAPacketizer_8h.html|DFAPacketizer.h]] ([[https://gcc.gnu.org/news/dfa.html|DFA Scheduler]])
        * DFA Packetizer/Scheduler 用有限狀態機 (DFA) 向指令調度器描述目標平台的 [[https://en.wikipedia.org/wiki/Hazard_(computer_architecture)#Structural_hazards|Structural hazards]]。
      * [[http://llvm.org/doxygen/classllvm_1_1VLIWPacketizerList.html|VLIWPacketizerList]] 可以被目標平台繼承。[[http://llvm.org/doxygen/HexagonVLIWPacketizer_8h_source.html|HexagonVLIWPacketizer.h]] 為 VLIWPacketizerList 其中一個實現。
        * VLIWPacketizerList 包含底下成員:
          * [[http://llvm.org/doxygen/classllvm_1_1DFAPacketizer.html|DFAPacketizer]] 作為 ResourceTracker，用於判斷是否有足夠硬件資源將 MI 打包進當前的 Packet。
          * CurrentPacketMIs 代表當前 Packet 中所包含的 MI。
        * VLIWPacketizerList 包含底下函式:
          * [[http://llvm.org/doxygen/classllvm_1_1VLIWPacketizerList.html#acfb2315913d694fb3f1144279ab75a85|PacketizeMIs]] 是 Packetizer 的入口函式，於 ''HexagonPacketizer::runOnMachineFunction'' ([[http://llvm.org/doxygen/HexagonVLIWPacketizer_8cpp_source.html|HexagonVLIWPacketizer.cpp]]) 被調用。Machine Basic Block 會被切分成數個 Region，再由 [[http://llvm.org/doxygen/classllvm_1_1VLIWPacketizerList.html#acfb2315913d694fb3f1144279ab75a85|PacketizeMIs]] 將 Region 中的數個 MI 打包成一個 Packet (Bundle)。''PacketizeMIs'' 先對 Region 中的指令做調度，之後根據  DFAPacketizer 偵測 [[https://en.wikipedia.org/wiki/Hazard_(computer_architecture)#Structural_hazards|Structural hazards]]，根據 [[http://llvm.org/doxygen/classllvm_1_1VLIWPacketizerList.html#a2ed4f250d154c460d2acd08c81aae5b3|isLegalToPacketizeTogether]]和 [[http://llvm.org/doxygen/classllvm_1_1VLIWPacketizerList.html#a9cec2dc6fdf9c6662c833a91db3b7db3|isLegalToPruneDependencies]] 偵測 [[https://en.wikipedia.org/wiki/Hazard_(computer_architecture)#Data_hazards|Data hazards]]，決定 MI 是否能加入當前的 Packet。
          * [[http://llvm.org/doxygen/classllvm_1_1VLIWPacketizerList.html#ae687f7292a0b3db97788f744d37d85d4|addToPacket]] 於 PacketizeMIs 中被調用，將 MI 加入當前的 Packet，CurrentPacketMIs。
          * [[http://llvm.org/doxygen/classllvm_1_1VLIWPacketizerList.html#ac05188b9f403d9279681683e2a7b3d3e|endPacket]] 結束當前 Packet。''finalizeBundle'' ([[http://llvm.org/doxygen/MachineInstrBundle_8cpp_source.html|MachineInstrBundle.cpp]]) 透過 [[http://llvm.org/doxygen/classllvm_1_1MIBundleBuilder.html|MIBundleBuilder]] 將 CurrentPacketMIs 中的 MI 標記其屬於某一個 Packet。
    * Hexagon 有 packetize 的測試用例。<code bash>
$ lldb llc
(lldb) b HexagonPacketizer::runOnMachineFunction
(lldb) b VLIWPacketizerList::PacketizeMIs
(lldb) r test/CodeGen/Hexagon/packetize_cond_inst.ll
</code>

  * [[http://llvm.org/doxygen/classllvm_1_1TargetPassConfig.html|TargetPassConfig]] 有兩個虛擬函式 ''createMachineScheduler'' 和 ''createPostMachineScheduler'' 可以被覆寫。可以返回 ScheduleDAGInstrs 的子類，或是只返回 ScheduleDAGInstrs 的實例，傳入客制的 MachineSchedStrategy。又或是透過 ''substitutePass'' 將預設執行的 PostRASchedulerList 替換成 PostMachineScheduler。
    * 目前有 ARM/AArch64 和 SystemZ 使用 PostMachineScheduler。ARM/AArch64 使用預設的 MachineSchedStrategy，PostGenericScheduler，並加上自訂的 Mutation。SystemZ 使用自訂的 SystemZPostRASchedStrategy。
    * {{http://llvm.org/doxygen/classllvm_1_1GenericSchedulerBase__inherit__graph.png?350}}
    * [[http://llvm.org/doxygen/classllvm_1_1PostGenericScheduler.html|PostGenericScheduler]] 和用於 pre-register allocation 的 [[http://llvm.org/doxygen/classllvm_1_1GenericScheduler.html|GenericScheduler]] 不同。PostGenericScheduler 只做 top-down scheduling。專注 PostGenericScheduler::pickNode。
    * [[http://llvm.org/doxygen/structllvm_1_1GenericSchedulerBase_1_1SchedCandidate.html|SchedCandidate]]: 存放候選節點相關訊息。[[http://llvm.org/doxygen/classllvm_1_1PostGenericScheduler.html#ad9f7d64df62c7391f08ec630ea104e71|tryCandidate()]] 挑選節點。
      * [[http://llvm.org/doxygen/structllvm_1_1GenericSchedulerBase_1_1CandPolicy.html|CandPolicy]]: 於此候選節點所在區域中，挑選下一個候選節點所採用的策略。可以傾向 resource 受限或是 latency 受限，前者反映 structural hazard，後者反映 data hazard。透過 [[http://llvm.org/doxygen/classllvm_1_1GenericSchedulerBase.html#a8668556014566994c07b21391762551b|setPolicy()]] 設置。
      * [[http://llvm.org/doxygen/classllvm_1_1GenericSchedulerBase.html#ae0da1bc94e326020069c0f44170a48d3|CandReason]]: 表示該節點被選上的原因。原因之間有優先級之分。以 [[http://llvm.org/doxygen/MachineScheduler_8cpp.html#a9b112d0cb3309a5fda2d6b7601e89e3f|tryLess()]] 為例:<code c++>
// 取 TryVal 和 CandVal 中最小的。
static bool tryLess(int TryVal, int CandVal,
                    GenericSchedulerBase::SchedCandidate &TryCand,
                    GenericSchedulerBase::SchedCandidate &Cand,
                    GenericSchedulerBase::CandReason Reason) {
  if (TryVal < CandVal) {
    TryCand.Reason = Reason;
    return true;
  }
  if (TryVal > CandVal) {
    // Reason 越小，優先級越高。把 Cand.Reason 替換掉。
    if (Cand.Reason > Reason)
      Cand.Reason = Reason;
    return true;
  }
  return false;
}
</code>
        * 如果 TryVal 和 CandVal 相等，返回 false，進行下一條件比較。

  * 目前有底下幾個方式修改調度策略 ([[https://youtu.be/brpomKUynEA?t=32m43s|Writing Great Machine Schedulers (2017)]])。
    * 實現 [[http://llvm.org/doxygen/classllvm_1_1TargetSubtargetInfo.html#a9cd7c54e0bb13cc01c4e2a4133a40ee4| overrideSchedPolicy()]]。
    * 實現 [[http://llvm.org/doxygen/classllvm_1_1MachineSchedStrategy.html|MachineSchedStrategy]]。
    * 實現 [[http://llvm.org/doxygen/classllvm_1_1ScheduleDAGMutation.html|ScheduleDAGMutation]]。
==== Example ====
  * 以 Hexagon 為例，觀察其使用的 scheduler。<code bash>
$ llc -O2 -debug-pass=Structure test/CodeGen/Hexagon/packetize_cond_inst.ll
</code>
    * Machine Instruction Scheduler ([[http://llvm.org/doxygen/MachineScheduler_8cpp_source.html|MachineScheduler.cpp]]): pre-register allocation MI scheduler。
    * Post RA top-down list latency scheduler ([[http://llvm.org/doxygen/PostRASchedulerList_8cpp_source.html|PostRASchedulerList.cpp]]): post-register allocation MI scheduler。
      * 從 ''runOnMachine'' 開始，直到 ''SchedulePostRATDList::ListScheduleTopDown'' 開始實際的調度。<code c++>
void SchedulePostRATDList::ListScheduleTopDown() {
  while (!AvailableQueue.empty() || !PendingQueue.empty()) {

  }
</code>
        * 主要維持兩個佇列，AvailableQueue 和 PendingQueue。如果 PendingQueue 中的指令，其結果已經運算完成，會被放到 AvailableQueue。最終會從 AvailableQueue 中挑出當前 cycle 要執行的指令。

  * 以 Hexagon 為例，判讀 debug 訊息。<code bash>
$ llc -O2 -debug-only=post-RA-sched test/CodeGen/Hexagon/packetize_cond_inst.ll
********** List Scheduling **********
SU(0):   %P0<def> = C4_cmplte %R2, %R1
  # preds left       : 0
  # succs left       : 3
  # rdefs left       : 0
  Latency            : 1
  Depth              : 0
  Height             : 4
  Successors:
    SU(2): Data Latency=1 Reg=%P0
    SU(1): Data Latency=1 Reg=%P0
    SU(1): Anti Latency=0

SU(1):   %R1<def> = A2_paddt %P0, %R2<kill>, %R1<kill>
  # preds left       : 2
  # succs left       : 2
  # rdefs left       : 0
  Latency            : 1
  Depth              : 1
  Height             : 3
  Predecessors:
    SU(0): Data Latency=1 Reg=%P0
    SU(0): Anti Latency=0
  Successors:
    SU(2): Out  Latency=1
    SU(2): Data Latency=1 Reg=%R1
</code>
    * 一開始印出 SUnit 在 dependency graph 中的各種屬性，以及和其它 SUnit 的依賴關係。
    * 接著打印出調度的過程。<code>
Reset hazard recognizer

*** Examining Available
*** Scheduling [0]: SU(0):   %P0<def> = C4_cmplte %R2, %R1
 Add instruction %P0<def> = C4_cmplte %R2, %R1

*** Examining Available
*** Finished cycle 0
Advance cycle, clear state
</code>
      * ''***'' 開頭的訊息，是由[[http://llvm.org/doxygen/PostRASchedulerList_8cpp_source.html|SchedulePostRATDList::ListScheduleTopDown()]] 生成。其它訊息由 HazardRecognizer 選擇性生成 (如: [[http://llvm.org/doxygen/HexagonHazardRecognizer_8cpp_source.html|HexagonHazardRecognizer.cpp]])。顯示在第幾個 cycle，選中哪一條指令。
===== TableGen =====
<note>
  * [[http://llvm.org/devmtg/2012-04-12/Slides/Reed_Kotler.pdf|Tablegen Deep Dive (2012)]] ([[http://llvm.org/devmtg/2012-04-12/videos/Reed_Kotler-desktop.mov|Video]] [[https://github.com/azru0512/tblgen|Example]])
  * [[https://nhaehnle.blogspot.de/2018/02/tablegen-1-what-has-tablegen-ever-done.html|TableGen #1: What has TableGen ever done for us?]]
  * 可以參考 test/TableGen 底下的測試用例。例如: [[https://github.com/llvm-mirror/llvm/blob/master/test/TableGen/SetTheory.td|SetTheory.td]]。
</note>
  * [[http://llvm.org/docs/TableGen/index.html|TableGen]]: TableGen 語法近似 C++ class template，借用函數式編程相關的語法。
    * [[http://llvm.org/docs/TableGen/LangIntro.html|TableGen Language Introduction]]: 需要寫 *.td 可以參考這份文件。
      * TableGen 裡所提到的 backend ([[https://llvm.org/docs/TableGen/BackEnds.html|TableGen BackEnds]])，指的是其命令行選項，如: ''-print-records'' (''-print-records'' 是預設後端)。
      * TableGen 裡的 ''class'' 和 ''def'' 都是 record。 ''class'' 類似 C++ 裡的 Class，''def'' 類似 C++ 裡的 Object。 
    * [[http://llvm.org/docs/TableGen/LangRef.html|TableGen Language Reference]]: 需要 parsing *.td 可以參考這份文件。
      * TableGen 支持的型別:
        * [[http://llvm.org/docs/TableGen/LangIntro.html#the-tablegen-type-system|The TableGen type system]] 
        * [[http://llvm.org/docs/TableGen/LangRef.html#types|Types]]。
      * TableGen 支持的值:
        * [[http://llvm.org/docs/TableGen/LangIntro.html#the-tablegen-type-system|TableGen values and expressions]] 
        * [[http://llvm.org/docs/TableGen/LangRef.html#values|Values]]
      * dag: dag 的第一個元素是 operator，必須是 record definition。dag 其實是用 lisp 語法描述的 tree，而非 directed acyclic graph。
        * [[https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Target/Target.td|Target.td]] 描述一些 dag 的操作符，如: ''outs'' 和 ''ins''，分別用於 ''(outs R32:$dst)'' 和 ''(ins R32:$src1, R32:$src2)''。
        * [[https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Target/TargetSelectionDAG.td|TargetSelectionDAG.td]] 描述一些 dag 的操作符，如: ''set''。

  * Register ([[https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Target/Target.td|Target.td]])
    * SubRegIndex
      * ''list<Register> SubRegs'' 和 ''list<SubRegIndex> SubRegIndices'' 一起搭配。前者定義此 register 包含哪些 sub-register，後者用來定位 (index) 這些 sub-register。以 [[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/X86/X86RegisterInfo.td|X86RegisterInfo.td]] 為例:<code>
class X86Reg<string n, bits<16> Enc, list<Register> subregs = []> : Register<n> {
  let Namespace = "X86";
  let HWEncoding = Enc;
  let SubRegs = subregs;
}

let Namespace = "X86" in {
  def sub_8bit    : SubRegIndex<8>;
  def sub_8bit_hi : SubRegIndex<8, 8>; // 此 subreg 大小為 8 bit，在 reg 從 offset 8 bit 開始。
  def sub_16bit   : SubRegIndex<16>;
  def sub_32bit   : SubRegIndex<32>;
  def sub_xmm     : SubRegIndex<128>;
  def sub_ymm     : SubRegIndex<256>;
}

let SubRegIndices = [sub_8bit, sub_8bit_hi], CoveredBySubRegs = 1 in {
def AX : X86Reg<"ax", 0, [AL,AH]>; // AX 由 AL 和 AH 兩個 subreg 組成，AL 和 AH 如何在 AX 佔位由 SubRegIndices 描述。
def DX : X86Reg<"dx", 2, [DL,DH]>;
def CX : X86Reg<"cx", 1, [CL,CH]>;
def BX : X86Reg<"bx", 3, [BL,BH]>;
}
</code>
    * RegisterClass。以 [[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/X86/X86RegisterInfo.td|X86RegisterInfo.td]] 為例:<code>
class RegisterClass<string namespace, list<ValueType> regTypes, int alignment,
                    dag regList, RegAltNameIndex idx = NoRegAltName>

def GR32 : RegisterClass<"X86", [i32], 32,
                         (add EAX, ECX, EDX, ESI, EDI, EBX, EBP, ESP,
                              R8D, R9D, R10D, R11D, R14D, R15D, R12D, R13D)>;
</code> 

  * Instruction ([[https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Target/Target.td|Target.td]])
    * 其中的 ''list<dag> Pattern'' 欄位即是用來撰寫指令選擇所需要的 pattern match。
    * 目標平台撰寫的指令都是 Instruction 的子類。以 Sparc 為例 ([[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/Sparc/SparcInstrInfo.td|SparcInstrInfo.td]])<code>
def XNORrr  : F3_1<2, 0b000111,
                   (outs IntRegs:$rd), 輸出
                   (ins IntRegs:$rs1, IntRegs:$rs2), 輸入
                   "xnor $rs1, $rs2, $rd", // 匯編指令
                   [(set i32:$rd, (not (xor i32:$rs1, i32:$rs2)))] // DAG 匹配樣式，$rd = not ($rs1 xor $rs2)
                   >;
</code>
      * ''F3_1'' 定義於 [[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/Sparc/SparcInstrFormats.td|SparcInstrFormats.td]]，繼承自 Instruction ([[https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Target/Target.td|Target.td]])。
      * 匹配到 DAG ''$rd = not ($rs1 xor $rs2)'' 之後，會將其替換成 ''XNORrr''。

  * Operand ([[https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Target/Target.td|Target.td]])
    * ValueType ([[https://github.com/llvm-mirror/llvm/blob/master/include/llvm/CodeGen/ValueTypes.td|ValueTypes.td]])<code>
class Operand<ValueType ty> : DAGOperand {
  ValueType Type = ty;
  string PrintMethod = "printOperand";
  string EncoderMethod = "";
  bit hasCompleteDecoder = 1;
  string OperandType = "OPERAND_UNKNOWN";
  dag MIOperandInfo = (ops);
  
  ...
}
</code>
    * Target.td 有定義一些 operand，但針對地址計算這一類的 operand 而言，一般要目標平台自行定義。<code>
/// PointerLikeRegClass - Values that are designed to have pointer width are
/// derived from this.  TableGen treats the register class as having a symbolic
/// type that it doesn't know, and resolves the actual regclass to use by using
/// the TargetRegisterInfo::getPointerRegClass() hook at codegen time.
class PointerLikeRegClass<int Kind> {
  int RegClassKind = Kind;
}


/// ptr_rc definition - Mark this operand as being a pointer value whose
/// register class is resolved dynamically via a callback to TargetInstrInfo.
/// FIXME: We should probably change this to a class which contain a list of
/// flags. But currently we have but one flag.
def ptr_rc : PointerLikeRegClass<0>;

def MEMrr : Operand<iPTR> {
  let PrintMethod = "printMemOperand";
  let MIOperandInfo = (ops ptr_rc, ptr_rc);
  let ParserMatchClass = SparcMEMrrAsmOperand;
}
def MEMri : Operand<iPTR> {
  let PrintMethod = "printMemOperand";
  let MIOperandInfo = (ops ptr_rc, i32imm);
  let ParserMatchClass = SparcMEMriAsmOperand;
}
</code>
      * [[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/Sparc/SparcInstrInfo.td|SparcInstrInfo.td]]
      * [[https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Target/Target.td|Target.td]]

  * Calling Convention ([[https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Target/TargetCallingConv.td|TargetCallingConv.td]])
    * [[http://llvm.org/docs/WritingAnLLVMBackend.html#calling-conventions|Calling Conventions]]
    * 一般寫在各自的 CallingConv.td。例如: [[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/X86/X86CallingConv.td|X86CallingConv.td]]。又或者寫在 RegisterInfo.td。例如: [[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/Hexagon/HexagonRegisterInfo.td|HexagonRegisterInfo.td]]。
    * 定義 callee saved register。<code>
class CalleeSavedRegs<dag saves> {
  dag SaveList = saves;
  dag OtherPreserved;
}

def HexagonCSR
  : CalleeSavedRegs<(add R16, R17, R18, R19, R20, R21, R22, R23,
                         R24, R25, R26, R27)>;
</code>
    * 根據參數型別，透過特定暫存器傳遞，這是一種 action。[[https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Target/TargetCallingConv.td|TargetCallingConv.td]] 提供多種 action 設定方式。<code>
class CCIfType<list<ValueType> vts, CCAction A> : CCPredicateAction<A> {
  list<ValueType> VTs = vts;
}

class CCAssignToReg<list<Register> regList> : CCAction {
  list<Register> RegList = regList;
}

CCIfType<[f32,f64], CCAssignToReg<[R0, R1]>>
</code>
    * 可以將數條 action 打包在一起，作為一組 calling convention。例如針對不同返回值，透過不同暫存器傳遞。<code>
class CallingConv<list<CCAction> actions> {
  list<CCAction> Actions = actions;
  bit Custom = 0;
}

def RetCC_Sparc32 : CallingConv<[
  CCIfType<[i32], CCAssignToReg<[I0, I1]>>,
  CCIfType<[f32], CCAssignToReg<[F0]>>,
  CCIfType<[f64], CCAssignToReg<[D0]>>
]>;
</code>

  * Intrinsics
    * [[http://lists.llvm.org/pipermail/llvm-dev/2015-October/091684.html|[llvm-dev] add intrinsic function support for customized backend]]
    * [[http://lists.llvm.org/pipermail/llvm-dev/2017-June/114321.html|[llvm-dev] ADDING A CUSTOM INSTRINSIC]]
    * 先在 LLVM 加上支持，再修改 Clang 處理 intrinsic。<code>
__builtin_yyy -> @llvm.xxx.yyy -> yyy
</code>
      * Clang
        * 各平台的 intrinsic 各自定義在 ''tools/clang/include/clang/Basic/'' 底下。例如: [[https://github.com/llvm-mirror/clang/blob/master/include/clang/Basic/BuiltinsARM.def|BuiltinsARM.def]]。會用到 [[https://github.com/llvm-mirror/clang/blob/master/include/clang/Basic/Builtins.def|Builtins.def]] 定義的型別和屬性。<code>
BUILTIN(__builtin_atan2 ,      "ddd"   ,    "Fnc"    )
         builtin 函式名     返回和參數型別   builtin 屬性
</code>
        * [[https://clang.llvm.org/doxygen/CodeGenFunction_8h_source.html|CodeGenFunction.h]]: 宣告 Clang 遇到 builtin/intrinsic 時要調用的 Emit 函式。例如: ''EmitARMBuiltinExpr''。
        * [[https://clang.llvm.org/doxygen/CGBuiltin_8cpp_source.html|CGBuiltin.cpp]]: 定義 Clang 遇到 builtin/intrinsic 時要調用的 Emit 函式。例如: ''EmitARMBuiltinExpr''。[[https://clang.llvm.org/doxygen/CGBuiltin_8cpp.html#a1d6741b01869d4dc731f0e1211d0e4cd|EmitTargetArchBuiltinExpr()]] 調用定義前述的 Emit 函式。
        * [[https://clang.llvm.org/doxygen/SemaChecking_8cpp_source.html|SemaChecking.cpp]]: 如有需要，添加對 intrinsic 參數的檢查。例如: ''Sema::CheckARMBuiltinFunctionCall''。
      * LLVM
        * 在 ''include/llvm/IR/'' 底下平台各自定義 LLVM IR 階段會產生的 intrinsic，以 llvm.xxx 開頭，其中 xxx 為目標平台名稱。例如: [[https://github.com/llvm-mirror/llvm/blob/master/include/llvm/IR/IntrinsicsHexagon.td|IntrinsicsHexagon.td]]。會用到 [[https://github.com/llvm-mirror/llvm/blob/master/include/llvm/IR/Intrinsics.td|Intrinsics.td]] 中定義的型別和屬性。注意！在 [[https://github.com/llvm-mirror/llvm/blob/master/include/llvm/IR/Intrinsics.td|Intrinsics.td]] 最後要引用自己定義的 *.td 檔。<code>
def int_xxx_foo : Intrinsic<[llvm_i32_ty], [llvm_i32_ty, llvm_i32_ty], [IntrReadArgMem]>;
                               返回型別               參數型別                  屬性
</code>
        * 在 ''lib/Target/'' 底下各自的目錄定義如何將 LLVM IR 中的 intrinsic 對應到實際的指令。例如: [[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/Hexagon/HexagonIntrinsics.td|HexagonIntrinsics.td]]。<code>
let isPseudo = 1 in {
  def FOO : PseudoI<(outs i32mem:$dst),
                    (ins i32mem:$src1, i32mem:$src2,),
                    [(set i32mem:$dst, (int_xxx_foo i32mem:$src1, i32mem:$src2))]>;
}
</code>

      * [[https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Target/TargetSelectionDAG.td|TargetSelectionDAG.td]]
        * SDNode<code>
class SDNode<string opcode, SDTypeProfile typeprof,
             list<SDNodeProperty> props = [], string sdclass = "SDNode">
             : SDPatternOperator {
  string Opcode  = opcode;
  string SDClass = sdclass;
  let Properties = props;
  SDTypeProfile TypeProfile = typeprof;
}

def imm        : SDNode<"ISD::Constant"  , SDTIntLeaf , [], "ConstantSDNode">;
</code>
          * opcode 來自於 [[http://llvm.org/doxygen/ISDOpcodes_8h_source.html|ISDOpcodes.h]]。
        * PatFrag/OutPatFrag/PatLeaf/ImmLeaf<code>
class PatFrag<dag ops, dag frag, code pred = [{}],
              SDNodeXForm xform = NOOP_SDNodeXForm> : SDPatternOperator {
  dag Operands = ops;
  dag Fragment = frag;
  code PredicateCode = pred;
  code ImmediateCode = [{}];
  SDNodeXForm OperandTransform = xform;
}
</code>
        * Pattern/Pat<code>
class Pattern<dag patternToMatch, list<dag> resultInstrs> {
  dag             PatternToMatch  = patternToMatch;
  list<dag>       ResultInstrs    = resultInstrs;
  list<Predicate> Predicates      = [];  // See class Instruction in Target.td.
  int             AddedComplexity = 0;   // See class Instruction in Target.td.
}

class Pat<dag pattern, dag result> : Pattern<pattern, [result]>;
                匹配         輸出
</code>
          * dag 是 TableGen 自定義型別。Pat 會匹配 pattern，輸出 result。dag 可以使用的 operator 來自 [[https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Target/Target.td|Target.td]] (如: COPY_TO_REGCLASS) 或是 [[https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Target/TargetSelectionDAG.td|TargetSelectionDAG.td]] (如: add)。
            * [[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/Hexagon/HexagonIntrinsics.td|HexagonIntrinsics.td]]<code>
class T_I_pat <InstHexagon MI, Intrinsic IntID>
  : Pat <(IntID imm:$Is), // 匹配
         (MI imm:$Is)>;   // 輸出
</code>
            * [[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/Hexagon/HexagonPatterns.td|HexagonPatterns.td]]<code>
def: Pat<(i64 (add I64:$Rs, I64:$Rt)), (A2_addp I64:$Rs, I64:$Rt)>;
                     匹配                          輸出
</code>
==== Register Pair ====
  * 支持 quad-register ([[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/AMDGPU/SIRegisterInfo.td|SIRegisterInfo.td]])。<code>
class RegisterTuples<list<SubRegIndex> Indices, list<dag> Regs> {
  // SubRegs - N lists of registers to be zipped up. Super-registers are
  // synthesized from the first element of each SubRegs list, the second
  // element and so on.
  list<dag> SubRegs = Regs;

  // SubRegIndices - N SubRegIndex instances. This provides the names of the
  // sub-registers in the synthesized super-registers.
  list<SubRegIndex> SubRegIndices = Indices;
}

// VGPR_32 共有 VGPR0 ~ VGPR255 個暫存器。
def VGPR_32 : RegisterClass<"AMDGPU", [i32, f32, i16, f16, v2i16, v2f16], 32,
                            (add (sequence "VGPR%u", 0, 255))> {
  let AllocationPriority = 1;
  let Size = 32;
}

// VGPR 128-bit registers
def VGPR_128 : RegisterTuples<[sub0, sub1, sub2, sub3],
                              [(add (trunc VGPR_32, 253)), // VGPR0 ~ VGPR252
                               (add (shl VGPR_32, 1)),     // VGPR1 ~ VGPR255
                               (add (shl VGPR_32, 2)),     // VGPR2 ~ VGPR255
                               (add (shl VGPR_32, 3))]>;   // VGPR3 ~ VGPR255
                               
def VReg_128 : RegisterClass<"AMDGPU", [v4i32, v4f32, v2i64, v2f64], 32, (add VGPR_128)> {
  let Size = 128;

  // Requires 4 v_mov_b32 to copy
  let CopyCost = 4;
  let AllocationPriority = 4;
}
</code>
    * trunc 將第 N 個元素以後的元素從集合中截去。
    * shl 移除集合中前 N 個元素。
    * decimate 從集合中挑出所有第 N 個元素。
<code>
// Calling convention for leaf functions
def CC_AMDGPU_Func : CallingConv<[
  // 針對要映射到 quad-register 的型別，調用相應的處理函式。
  CCIfType<[i64, f64, v2i32, v2f32, v4i32, v4f32, v8i32, v8f32, v16i32, v16f32, v2i64, v2f64], CCCustom<"allocateVGPRTuple">>,
  CCIfType<[v4i32, v4f32, v2i64, v2f64], CCAssignToStack<16, 4>>,
]>;
</code>
==== Subtarget ====
  * 區分不同 subtaregt 的指令。Predicates 和 Requires 只對有寫 pattern match 的指令起作用。Predicates 是 Pat 父類別 Pattern 的成員，Requires 是獨立的類別。<code>
def HasV5T : Predicate<"HST->hasV5TOps()">, AssemblerPredicate<"ArchV5">;

let Predicates = [HasV5T] in {
  def: Pat<(f32 (fminnum F32:$Rs, F32:$Rt)), (F2_sfmin F32:$Rs, F32:$Rt)>;
  def: Pat<(f32 (fmaxnum F32:$Rs, F32:$Rt)), (F2_sfmax F32:$Rs, F32:$Rt)>;
}

def: Pat<(sra (add (sra I64:$src, u6_0ImmPred:$u6), 1), (i32 1)),
         (S2_asr_i_p_rnd DoubleRegs:$src, imm:$u6)>, Requires<[HasV5T]>;
</code>
==== InstrMapping ====
不同版本指令之間的映射關係，例如 Hexagon 的 Dot-New 指令 ([[https://developer.qualcomm.com/download/hexagon/hexagon-dsp-architecture.pdf|Hexagon DSP Architecture]])，可以使用 InstrMapping 表示 ([[https://llvm.org/docs/HowToUseInstrMappings.html|How To Use Instruction Mappings]])。例如我們想建立 add/add.t/add.f 和 sub/sub.t/sub.f 的映射表，以 add/sub 為 key，可以取得不同版本的指令，如: add.t/sub.t 和 add.f/sub.f。

|              ^ no pred              ^ true               ^ false              ^
^ add          | add                  | add.t              | add.f              |        
^ sub          | sub                  | sub.t              | sub.f              |

  * 以 Hexagon 為例。[[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/Hexagon/Hexagon.td|Hexagon.td]] 定義多個 InstrMapping ([[https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Target/Target.td|Target.td]])。此處定義的 InstrMapping，如: getPredOpcode，會生成為一個函式供人調用。<code>
def getPredOpcode : InstrMapping {
  // 描述此 mapping 的字串。
  let FilterClass = "PredRel";

  // add/add.t/add.f 和 sub/sub.t/sub.f 分別有相同的 RowFields。
  let RowFields = ["BaseOpcode", "isNVStore", "PNewValue", "isBrTaken", "isNT"];

  // add/sub，add.t/sub.t 和 add.f/sub.f 分別有相同的 ColFields。   
  let ColFields = ["PredSense"];
  
  // 取 ColFields 某值作為 mapping 中的 key。這裡取 add 和 sub 作為 key。
  let KeyCol = [""];
  
  // 取 ColFields 某值作為 mapping 中的 value。add 映射到 add.t/add.f，sub 映射到 sub.t/sub.f。
  let ValueCols = [["true"], ["false"]]; 
}
</code>
  * [[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/Hexagon/HexagonInstrFormats.td|HexagonInstrFormats.td]] 在 InstHexagon 中添加 InstrMapping 所需要的欄位。<code>
class InstHexagon<dag outs, dag ins, string asmstr, list<dag> pattern,
                  string cstr, InstrItinClass itin, IType type>
  : Instruction {
  let Namespace = "Hexagon";

  // Fields used for relation models.
  string isNT = ""; // set to "true" for non-temporal vector stores.
  string BaseOpcode = "";
  string PredSense = "";
  string PNewValue = "";
  string isBrTaken = !if(isTaken, "true", "false"); // Set to "true"/"false" for jump instructions

  let PredSense = !if(isPredicated, !if(isPredicatedFalse, "false", "true"),
                                    "");
  let PNewValue = !if(isPredicatedNew, "new", "");
  let NValueST = !if(isNVStore, "true", "false");
  let isNT = !if(isNonTemporal, "true", "false");
}
</code>
    * 此處搭配 TSFlag ([[https://github.com/llvm-mirror/llvm/blob/master/include/llvm/Target/Target.td|Target.td]]) 設置 RowFields 和 ColFields。
  * [[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/Hexagon/HexagonDepInstrInfo.td|HexagonDepInstrInfo.td]] 定義彼此互有關聯的指令。<code>
class HInst<dag outs, dag ins, string asmstr, InstrItinClass itin, IType type> :
      InstHexagon<outs, ins, asmstr, [], "", itin, type>;

def A2_add : HInst<
(outs IntRegs:$Rd32),
(ins IntRegs:$Rs32, IntRegs:$Rt32),
"$Rd32 = add($Rs32,$Rt32)",
tc_548f402d, TypeALU32_3op>, Enc_5ab2be, PredNewRel, ImmRegRel {
let Inst{7-5} = 0b000;
let Inst{13-13} = 0b0;
let Inst{31-21} = 0b11110011000;
let hasNewValue = 1;
let opNewValue = 0;
let CextOpcode = "A2_add";
let InputType = "reg";
let BaseOpcode = "A2_add";
let isCommutable = 1;
let isPredicable = 1;
}
      
def A2_paddf : HInst<
(outs IntRegs:$Rd32),
(ins PredRegs:$Pu4, IntRegs:$Rs32, IntRegs:$Rt32),
"if (!$Pu4) $Rd32 = add($Rs32,$Rt32)",
tc_1b6011fb, TypeALU32_3op>, Enc_ea4c54, PredNewRel, ImmRegRel {
let Inst{7-7} = 0b1;
let Inst{13-13} = 0b0;
let Inst{31-21} = 0b11111011000;
let isPredicated = 1;
let isPredicatedFalse = 1;
let hasNewValue = 1;
let opNewValue = 0;
let CextOpcode = "A2_add";
let InputType = "reg";
let BaseOpcode = "A2_add";
}

def A2_paddt : HInst<
(outs IntRegs:$Rd32),
(ins PredRegs:$Pu4, IntRegs:$Rs32, IntRegs:$Rt32),
"if ($Pu4) $Rd32 = add($Rs32,$Rt32)",
tc_1b6011fb, TypeALU32_3op>, Enc_ea4c54, PredNewRel, ImmRegRel {
let Inst{7-7} = 0b0;
let Inst{13-13} = 0b0;
let Inst{31-21} = 0b11111011000;
let isPredicated = 1;
let hasNewValue = 1;
let opNewValue = 0;
let CextOpcode = "A2_add";
let InputType = "reg";
let BaseOpcode = "A2_add";
}
</code>
  * [[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/Hexagon/HexagonInstrInfo.cpp|HexagonInstrInfo.cpp]] 再利用之前定義的 InstrMapping 接口，取得不同版本的的指令。<code c++>
int HexagonInstrInfo::getCondOpcode(int Opc, bool invertPredicate) const {
  enum Hexagon::PredSense inPredSense;
  inPredSense = invertPredicate ? Hexagon::PredSense_false :
                                  Hexagon::PredSense_true;
  int CondOpcode = Hexagon::getPredOpcode(Opc, inPredSense);
  if (CondOpcode >= 0) // Valid Conditional opcode/instruction
    return CondOpcode;

  llvm_unreachable("Unexpected predicable instruction");
}
</code>

在 Hexagon Dot-New 指令 ([[https://developer.qualcomm.com/download/hexagon/hexagon-dsp-architecture.pdf|Hexagon DSP Architecture]]) 的應用。原本 VLIW 的一個 bundle (packet) 內的指令不會有依賴關係，為了增加 bundle 內的指令個數，引入 Dot-New 指令，使得 bundle 內的指令可以有依賴關係。
  * ''isLegalToPacketizeTogether'' 檢視 SUI 是否能和 SUJ 做成一個 packet。<code c++>
bool HexagonPacketizerList::isLegalToPacketizeTogether(SUnit *SUI, SUnit *SUJ) {

  // SUJ 是當前 packet 內已存在的指令，檢查 SUI 是否和 SUJ 有依賴，以及該種依賴是否可以處理，
  // 使得 SUI 和 SUJ 可以放在同一個 packet。
  for (unsigned i = 0; i < SUJ->Succs.size(); ++i) {
    if (SUJ->Succs[i].getSUnit() != SUI)
      continue;

    // SUI 和 SUJ 是何種依賴?
    SDep::Kind DepType = SUJ->Succs[i].getKind();
    
    // For instructions that can be promoted to dot-new, try to promote.
    if (DepType == SDep::Data) {
      if (canPromoteToDotNew(I, SUJ, DepReg, II, RC)) {
        if (promoteToDotNew(I, DepType, II, RC)) {
          PromotedToDotNew = true;
          if (cannotCoexist(I, J))
            FoundSequentialDependence = true;
          continue;
        }
      }
      if (HII->isNewValueJump(I))
        continue;
    }
</code>
    * ''promoteToDotNew'' 透過 InstrMapping 生成的接口，返回當前指令對應的 Dot-New 版本。<code c++>
// Promote an instruction to its .new form. At this time, we have already
// made a call to canPromoteToDotNew and made sure that it can *indeed* be
// promoted.
bool HexagonPacketizerList::promoteToDotNew(MachineInstr &MI,
      SDep::Kind DepType, MachineBasicBlock::iterator &MII,
      const TargetRegisterClass* RC) {
  assert (DepType == SDep::Data);
  int NewOpcode;
  if (RC == &Hexagon::PredRegsRegClass)
    NewOpcode = HII->getDotNewPredOp(MI, MBPI);
  else
    NewOpcode = HII->getDotNewOp(MI);
  MI.setDesc(HII->get(NewOpcode));
  return true;
}
</code>
===== Writing Backend =====
<note>
  * 此節只簡略介紹開發後端會碰到的主要幾個檔案，不深入研究。細節可以分到其它地方。
  * [[http://llvm.org/docs/WritingAnLLVMBackend.html|Writing an LLVM Backend]]
    * [[https://github.com/Jonathan2251/lbd.git|lbd: llvm backend document]]
      * [[http://ccckmit.github.io/co/htm/cpu0.html|開放電腦計畫 -- 計算機硬體結構]]
  * 參考 [[https://github.com/lowRISC/riscv-llvm| RISC-V LLVM]] 的 commit 順序。
</note>

先介紹後端主要目錄結構。後端代碼位於 ''include/llvm'' 和 ''lib'' 底下幾個目錄:
  * CodeGen: 通用代碼生成算法。
  * MC: 匯編/反匯編相關。
  * TableGen: ''tblgen'' 代碼。LLVM 利用 ''tblgen'' 讀取目標平台 *.td 檔生成 instruction selector，instruction scheduling，register allocation 和 assembly printing。<code bash>
$ cd lib/Target/X86
$ llvm-tblgen X86.td -print-enums -class=Register -I ../../../include/
$ llvm-tblgen X86.td -print-enums -class=Instruction -I ../../../include/
</code>
    * [[http://llvm.org/docs/TableGen/index.html|TableGen]]
      * ''tblgen'' 語法近似 C++ template。*.td 檔主體為 record，record 分為兩類: class 和 definition。class 是模板 record，definition 是實例 record。
    * [[http://llvm.org/docs/TableGen/LangIntro.html|TableGen Language Introduction]]<code bash>
$ cat insns.td
// 指令基類
class Insn<bits <4> MajOpc, bit MinOpc> {
  bits<32> insnEncoding; // 指令編碼，長度 32 位。
  let insnEncoding{15-12} = MajOpc; // 第 15 至 12 位為 MajOpc
  let insnEncoding{11} = MinOpc;    // 第 11 位為 MinOpc
}

multiclass RegAndImmInsn<bits <4> opcode> {
  def rr : Insn<0x00, 0>; // MajOpc 皆為 0x00, MinOpc 分別為 0 和 1。
  def ri : Insn<0x00, 1>;
}

// SUB 指令
def SUB: Insn<0x00, 0>; // MajOpc = 0x00, MinOpc = 0

// ADD 指令。運算元可以是暫存器 + 暫存器，或是暫存器 + 立即數。
defm ADD : RegAndImmInsn<0x01>;
$ llvm-tblgen -print-records inst.td
</code>
  * Target: 不同後端有各自的子目錄。

這裡介紹開發後端會碰到的主要幾個檔案。參考文檔:
  * [[http://llvm.org/docs/WritingAnLLVMBackend.html|Writing an LLVM Backend]]
  * [[http://llvm.org/devmtg/2012-04-12/Slides/Workshops/Anton_Korobeynikov.pdf|Tutorial: Building a backend in 24 hours (2012) 28 頁]]
  * [[http://people.cs.nctu.edu.tw/~chenwj/slide/LLVM/LLVM-how-to-add-backend.txt]]

以 X86 為例，[[http://llvm.org/doxygen/X86TargetMachine_8h_source.html|X86TargetMachine.h]] 和 [[https://llvm.org/svn/llvm-project/llvm/trunk/lib/Target/X86/X86.td|X86.td]] 是兩個最主要的檔案。上層平台無關代碼生成算法透過 X86TargetMachine.h 的接口得到底層平台相關資訊。X86.td 描述 ISA 擴展和處理器家族，並 include 其它 *.td 檔。

  * [[http://llvm.org/docs/WritingAnLLVMBackend.html#target-machine|Target Machine]]
    * [[http://llvm.org/doxygen/X86TargetMachine_8h_source.html|X86TargetMachine.h]]: 後端的主要類別，負責膠合其它後端類別並負責控制後端管線。
  * [[http://llvm.org/docs/WritingAnLLVMBackend.html#register-set-and-register-classes|Register Set and Register Classes]]
    * [[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/X86/X86RegisterInfo.td|X86RegisterInfo.td]]: 定義目標平台的暫存器和暫存器組，暫存器組指對某一類指令而言，該暫存器組會被以相同方式對待。
    * [[http://llvm.org/doxygen/X86RegisterInfo_8h_source.html|X86RegisterInfo.h]]: 引入 ''X86RegisterInfo.td'' 生成的 ''X86GenRegisterInfo.inc''。提供關於暫存器組的各項資訊，諸如: 被調用方保存暫存器，暫存器分配次序等等。
  * [[http://llvm.org/docs/WritingAnLLVMBackend.html#instruction-set|Instruction Set]]
    * [[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/X86/X86.td|X86.td]]: 描述 ISA 擴展和處理器家族。此檔會 include 其它所有 *.td 檔。
    * [[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/X86/X86InstrFormats.td|X86InstrFormats.td]]: 描述如何從 DAG 對應到目標指令。參閱 ''include/llvm/Target/Target.td''，從 DAG (MI) 轉成目標匯編。
    * [[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/X86/X86InstrInfo.td|X86InstrInfo.td]]: ''include/llvm/Target/TargetSelectionDAG.td'' 定義各種 SDNode 型別。
  * [[http://llvm.org/docs/WritingAnLLVMBackend.html#instruction-selector|Instruction Selector]]
    * [[http://llvm.org/doxygen/X86ISelDAGToDAG_8cpp_source.html|X86ISelDAGToDAG.cpp]]: 負責指令選取。
  * [[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/X86/X86CallingConv.td|X86CallingConv.td]]: 調用約定。
    * [[http://llvm.org/doxygen/X86ISelLowering_8h_source.html|X86ISelLowering.h]]: 負責將 LLVM IR 降轉 (lowering) 成 Selection DAG，以便後續選取指令。
    * ''X86ISelLowering.h'' 繼承 [[http://llvm.org/doxygen/classllvm_1_1TargetLowering.html|TargetLowering]] 實現回調，並定義平台相關的 SDNode。
  * [[http://llvm.org/docs/WritingAnLLVMBackend.html#subtarget-support|Subtarget Support]]
    * [[http://llvm.org/doxygen/X86Subtarget_8h_source.html|X86Subtarget.h]]: 一個 TargetMachine 可能有多個子目標，控制不同子目標的特性。
====== JIT ======
<note important>
  * MCJIT 會被 ORCJIT 取代。
    * [[https://www.youtube.com/watch?v=hILdR8XRvdQ|2016 LLVM Developers’ Meeting: ORC -- LLVM's Next Generation of JIT API]]
</note>

  * [[http://eli.thegreenplace.net/2017/adventures-in-jit-compilation-part-1-an-interpreter/|Adventures in JIT compilation: Part 1 - an interpreter]]
  * [[http://eli.thegreenplace.net/2017/adventures-in-jit-compilation-part-2-an-x64-jit/|Adventures in JIT compilation: Part 2 - an x64 JIT]]

====== 其它特性 ======
<note>這裡放感興趣的特性。</note>
  * [[Exception handling]]
    * [[https://llvm.org/docs/ExceptionHandling.html#exception-handling-support-on-the-target|Exception Handling support on the target]]
      * [[http://llvm.org/devmtg/2011-09-16/EuroLLVM2011-ExceptionHandling.pdf|The new LLVM exception handling scheme]]
      * [[http://blog.llvm.org/2011/11/llvm-30-exception-handling-redesign.html|LLVM 3.0 Exception Handling Redesign]]
    * 在後端搜尋關鍵字 EH 和 CFI。
      * [[http://llvm.org/doxygen/classllvm_1_1MCCFIInstruction.html|MCCFIInstruction]]
        * MCCFIInstruction::createDefCfaOffset: .cfi_def_cfa_offset
        * MCCFIInstruction::createOffset: .cfi_offset
      * [[http://llvm.org/doxygen/X86FrameLowering_8cpp_source.html|X86FrameLowering.cpp]]
        * 注意 BuildCFI。
        * 分有無 FP 的情況。基本上在保存 callee-saved 暫存器，或是保存 FP 之後，需要生成 CFI。
      * 後端 TargetLowering 必須實現 ''getExceptionPointerRegister'' 和 ''getExceptionSelectorRegister''。參考 [[http://llvm.org/doxygen/X86ISelLowering_8cpp_source.html|X86ISelLowering.cpp]]。例外處理需要傳遞兩個指針，一個指向例外本身，另一個指向例外的型別。前者由 ''getExceptionPointerRegister'' 傳遞，後者由 ''getExceptionSelectorRegister'' 傳遞。
      * 關於 ''EH_RETURN''，以 Mips 為例。參考 [[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/Mips/MipsInstrInfo.td|MipsInstrInfo.td]] 的註解:<code>
// Exception handling related node and instructions.
// The conversion sequence is:
// ISD::EH_RETURN -> MipsISD::EH_RETURN ->
// MIPSeh_return -> (stack change + indirect branch)
</code>
        * 在 ''MipsTargetLowering::lowerEH_RETURN'' ([[https://llvm.org/doxygen/MipsISelLowering_8cpp_source.html|MipsISelLowering.cpp]]) 將 ''ISD::EH_RETURN'' 替換成 ''MipsISD::EH_RETURN''。''ISD::EH_RETURN'' 的 offset 和 handler 選擇可用的暫存器傳遞，''MipsTargetLowering::lowerEH_RETURN'' 選擇使用暫存器 V0 和 V1 傳遞。
        * ''MipsDAGToDAGISel::Select'' ([[https://llvm.org/doxygen/MipsISelDAGToDAG_8cpp_source.html|MipsDAGToDAGISel.cpp]]) 將 ''MipsISD::EH_RETURN'' 匹配成 ''MIPSeh_return32''/''MIPSeh_return64''。''MIPSeh_return32''/''MIPSeh_return64'' 是在 [[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/Mips/MipsInstrInfo.td|MipsInstrInfo.td]] 定義的偽指令。<code>
def SDT_MipsEHRET : SDTypeProfile<0, 2, [SDTCisInt<0>, SDTCisPtrTy<1>]>;

def MIPSehret : SDNode<"MipsISD::EH_RETURN", SDT_MipsEHRET,
                      [SDNPHasChain, SDNPOptInGlue, SDNPVariadic]>;

let Uses = [V0, V1], isTerminator = 1, isReturn = 1, isBarrier = 1, isCTI = 1 in {
  def MIPSeh_return32 : MipsPseudo<(outs), (ins GPR32:$spoff, GPR32:$dst),
                                [(MIPSehret GPR32:$spoff, GPR32:$dst)]>;   // lowerEH_RETURN 生成 (MipsISD::EH_RETURN V0, V1)
                                                                           // 和此處需要一致。
  def MIPSeh_return64 : MipsPseudo<(outs), (ins GPR64:$spoff,
                                                GPR64:$dst),
                                [(MIPSehret GPR64:$spoff, GPR64:$dst)]>;
}
</code>
        * ''MipsSEInstrInfo::expandEhReturn'' ([[https://llvm.org/doxygen/MipsSEInstrInfo_8cpp_source.html|MipsSEInstrInfo.cpp]]) 擴展偽指令 ''MIPSeh_return32''/''MIPSeh_return64''，並生成指令，用 V0 調整棧，並將 V1 作為返回地址。
      * 關於 ''EH_RETURN''，以 XCore 為例。於 [[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/XCore/XCoreInstrInfo.td|XCoreInstrInfo.td]] 定義 ''EH_RETURN'' 偽指令。於 [[https://llvm.org/doxygen/XCoreISelLowering_8h_source.html|XCoreISelLowering.h]] 定義 ''XCoreISD::EH_RETURN''。
        * 於 ''XCoreTargetLowering::LowerEH_RETURN'' ([[https://llvm.org/doxygen/XCoreISelLowering_8cpp_source.html|XCoreISelLowering.cpp]]) 將 ''ISD::EH_RETURN'' 轉成 ''XCoreISD::EH_RETURN''，並將 offset 和 handler 保存至暫存器 R2 和 R3。''__builtin_eh_return'' 在 ''_Unwind_RaiseException'' 被調用，其中 caller-saved 暫存器 R0 和 R1 已被 personality function 用於傳參，因此這裡使用暫存器 R2 和 R3 傳遞 offset 和 handler。 ''XCoreDAGToDAGISel::Select'' ([[https://llvm.org/doxygen/XCoreISelDAGToDAG_8cpp_source.html|XCoreISelDAGToDAG.cpp]]) 將 ''XCoreISD::EH_RETURN'' 替換成 ''EH_RETURN'' 偽指令。
        * ''XCoreFrameLowering::emitEpilogue'' ([[https://llvm.org/doxygen/XCoreFrameLowering_8cpp_source.html|XCoreFrameLowering.cpp]]) 擴展偽指令 ''XCore::EH_RETURN''，並生成指令，用 R2 調整棧，並將 R3 作為返回地址。
      * 關於 ''EH_RETURN''，以 Hexagon 為例。
        * 於 [[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/Hexagon/HexagonPseudo.td|HexagonPseudo.td]] 定義 ''EH_RETURN_JMPR'' 偽指令，於 [[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/Hexagon/HexagonPatterns.td|HexagonPatterns.td]] 定義 pattern。
        * ''HexagonTargetLowering::LowerEH_RETURN'' ([[https://llvm.org/doxygen/HexagonISelLowering_8cpp_source.html|HexagonISelLowering.cpp]]) 將 ''ISD::EH_RETURN'' 轉成 ''HexagonISD::EH_RETURN''，並將 offset 存到暫存器 R28，handler 存到暫存器 R30 所指向的內存。
        * ''HexagonDAGToDAGISel::Select'' ([[https://llvm.org/doxygen/HexagonISelDAGToDAG_8cpp_source.html|HexagonISelDAGToDAG.cpp]]) 將 ''HexagonISD::EH_RETURN'' 替換成 ''EH_RETURN_JMPR'' 偽指令。
        * ''HexagonFrameLowering::insertEpilogueInBlock'' ([[https://llvm.org/doxygen/HexagonFrameLowering_8cpp_source.html|HexagonFrameLowering.cpp]]) 替換 ''EH_RETURN_JMPR'' 偽指令，並生成指令，用 R28 調整棧，並將 R30 所存的值作為返回地址。
      * 關於 ''EH_RETURN''，以 X86 為例。
        * 於 [[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/X86/X86InstrInfo.td|X86InstrInfo.td]] 為 ''X86ISD::EH_RETURN'' 定義 ''X86ehret'' SDNode。[[https://github.com/llvm-mirror/llvm/blob/master/lib/Target/X86/X86InstrCompiler.td|X86InstrCompiler.td]] 定義偽指令 ''EH_RETURN''。
        * ''X86TargetLowering::LowerEH_RETURN'' ([[https://llvm.org/doxygen/X86ISelLowering_8cpp_source.html|X86ISelLowering.cpp]]) 將 ''ISD::EH_RETURN'' 轉成 ''X86ISD::EH_RETURN''，並將 handler 存在 frame + offset 偏移處，且將 frame + offset 存在 ECX。
        * ''X86DAGToDAGISel::Select'' 將 ''X86ISD::EH_RETURN'' 替換成 ''X86::EH_RETURN'' 偽指令。
        * ''X86ExpandPseudo::ExpandMI'' ([[https://llvm.org/doxygen/X86ExpandPseudo_8cpp_source.html|X86ExpandPseudo.cpp]]) 替換 ''X86::EH_RETURN'' 偽指令，並生成指令，將棧指針調整至 ECX 所指的位址。

  * 關於 ''emitPrologue''/''emitEpilogue''，以 Mips 為例。可以透過 [[http://llvm.org/doxygen/MipsMachineFunction_8h.html|MipsFunctionInfo]] 或是 [[http://llvm.org/doxygen/structllvm_1_1MachineFunctionInfo.html|MachineFunctionInfo]] 定義的 ''callsEHReturn'' 得知函式是否有調用 eh_return。
    * ''MipsSEFrameLowering::emitPrologue'' ([[http://llvm.org/doxygen/MipsSEFrameLowering_8cpp_source.html|MipsSEFrameLowering.cpp]])。注意！此處需要為例外處理需要用到的暫存器生成對應的 cfi。<code c++>
   if (MipsFI->callsEhReturn()) {
     // Insert instructions that spill eh data registers.
     for (int I = 0; I < 4; ++I) {
       if (!MBB.isLiveIn(ABI.GetEhDataReg(I)))
         MBB.addLiveIn(ABI.GetEhDataReg(I));
       TII.storeRegToStackSlot(MBB, MBBI, ABI.GetEhDataReg(I), false,
                               MipsFI->getEhDataRegFI(I), RC, &RegInfo);
     }
 
     // Emit .cfi_offset directives for eh data registers.
     for (int I = 0; I < 4; ++I) {
       int64_t Offset = MFI.getObjectOffset(MipsFI->getEhDataRegFI(I));
       unsigned Reg = MRI->getDwarfRegNum(ABI.GetEhDataReg(I), true);
       unsigned CFIIndex = MF.addFrameInst(
           MCCFIInstruction::createOffset(nullptr, Reg, Offset));
       BuildMI(MBB, MBBI, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
           .addCFIIndex(CFIIndex);
     }
   }
</code>
    * ''MipsSEFrameLowering::emitEpilogue'' ([[http://llvm.org/doxygen/MipsSEFrameLowering_8cpp_source.html|MipsSEFrameLowering.cpp]])。<code c++>
   if (MipsFI->callsEhReturn()) {
     const TargetRegisterClass *RC =
         ABI.ArePtrs64bit() ? &Mips::GPR64RegClass : &Mips::GPR32RegClass;
 
     // Find first instruction that restores a callee-saved register.
     MachineBasicBlock::iterator I = MBBI;
     for (unsigned i = 0; i < MFI.getCalleeSavedInfo().size(); ++i)
       --I;
 
     // Insert instructions that restore eh data registers.
     for (int J = 0; J < 4; ++J) {
       TII.loadRegFromStackSlot(MBB, I, ABI.GetEhDataReg(J),
                                MipsFI->getEhDataRegFI(J), RC, &RegInfo);
     }
   }
</code>
    * ''MipsSEFrameLowering::determineCalleeSaves'' ([[http://llvm.org/doxygen/MipsSEFrameLowering_8cpp_source.html|MipsSEFrameLowering.cpp]]) 會調用 ''MipsFunctionInfo::createEhDataRegsFI'' ([[http://llvm.org/doxygen/MipsMachineFunction_8cpp_source.html|MipsMachineFunction.cpp]]) 為例外處理使用到的暫存器，創建 spill 空間。''emitPrologue''/''emitEpilogue'' 依據該位置生成 load/store 代碼。
  * 關於 ''emitPrologue''/''emitEpilogue''，以 XCore 為例。
    * ''XCoreFrameLowering::emitPrologue'' ([[http://llvm.org/doxygen/XCoreFrameLowering_8cpp_source.html|XCoreFrameLowering.cpp]])。<code c++>
     if (XFI->hasEHSpillSlot()) {
       // The unwinder requires stack slot & CFI offsets for the exception info.
       // We do not save/spill these registers.
       const Function *Fn = &MF.getFunction();
       const Constant *PersonalityFn =
           Fn->hasPersonalityFn() ? Fn->getPersonalityFn() : nullptr;
       SmallVector<StackSlotInfo, 2> SpillList;
       GetEHSpillList(SpillList, MFI, XFI, PersonalityFn,
                      MF.getSubtarget().getTargetLowering());
       assert(SpillList.size()==2 && "Unexpected SpillList size");
       EmitCfiOffset(MBB, MBBI, dl, TII,
                     MRI->getDwarfRegNum(SpillList[0].Reg, true),
                     SpillList[0].Offset);
       EmitCfiOffset(MBB, MBBI, dl, TII,
                     MRI->getDwarfRegNum(SpillList[1].Reg, true),
                     SpillList[1].Offset);
     }
</code>
    * ''XCoreFrameLowering::emitPrologue'' ([[http://llvm.org/doxygen/XCoreFrameLowering_8cpp_source.html|XCoreFrameLowering.cpp]])。<code c++>
   if (RetOpcode == XCore::EH_RETURN) {
     // 'Restore' the exception info the unwinder has placed into the stack
     // slots.
     const Function *Fn = &MF.getFunction();
     const Constant *PersonalityFn =
         Fn->hasPersonalityFn() ? Fn->getPersonalityFn() : nullptr;
     SmallVector<StackSlotInfo, 2> SpillList;
     GetEHSpillList(SpillList, MFI, XFI, PersonalityFn,
                    MF.getSubtarget().getTargetLowering());
     RestoreSpillList(MBB, MBBI, dl, TII, RemainingAdj, SpillList);
 
     // Return to the landing pad.
     unsigned EhStackReg = MBBI->getOperand(0).getReg();
     unsigned EhHandlerReg = MBBI->getOperand(1).getReg();
     BuildMI(MBB, MBBI, dl, TII.get(XCore::SETSP_1r)).addReg(EhStackReg);
     BuildMI(MBB, MBBI, dl, TII.get(XCore::BAU_1r)).addReg(EhHandlerReg);
     MBB.erase(MBBI);  // Erase the previous return instruction.
     return;
   }
</code>
    * ''XCoreFrameLowering::determineCalleeSaves'' ([[http://llvm.org/doxygen/XCoreFrameLowering_8cpp_source.html|XCoreFrameLowering.cpp]]) 會調用 ''XCoreFunctionInfo::createEHSpillSlot'' ([[http://llvm.org/doxygen/XCoreMachineFunctionInfo_8cpp_source.html|XCoreMachineFunctionInfo.cpp]]) 為例外處理使用到的暫存器，創建 spill 空間。''emitPrologue''/''emitEpilogue'' 依據該位置生成 load/store 代碼。

  * [[wp>Coroutine]]
    * [[http://llvm.org/docs/Coroutines.html|Coroutines in LLVM]]
      * [[https://www.youtube.com/watch?v=8C8NnE1Dg4A|CppCon 2016: Gor Nishanov “C++ Coroutines: Under the covers"]]
      * [[https://www.youtube.com/watch?v=Ztr8QvMhqmQ|2016 LLVM Developers’ Meeting: G. Nishanov “LLVM Coroutines”]]
        * [[http://llvm.org/devmtg/2016-11/Slides/Nishanov-LLVMCoroutines.pdf|LLVM Coroutines (2016)]]
    * [[http://electronic-blue.herokuapp.com/blog/2012/06/coroutine-an-introduction/|Coroutine: 入門篇]]
      * function 可以被認為是 coroutine 的一個特化版本。coroutine 內可以有多個暫停點 (yield/suspend)，當執行到暫停點時，控制流會回到 caller。caller 可以選擇回復 (resume) coroutine 的執行，coroutine 會從上一次的暫停點開始執行。function 就是沒有暫停點的 coroutine。

  * [[wp>Software pipelining]]
    * 將多個 loop iteration 排進流水線，最大化 ILP。需要考慮 data dependency (data dependency graph) 和硬件資源的限制 (resource reservation table)。
      * [[ftp://gcc.gnu.org/pub/gcc/summit/2004/Swing%20Modulo%20Scheduling.pdf|Swing Modulo Scheduling for GCC]]
      * [[http://llvm.org/pubs/2005-06-17-LattnerMSThesis.pdf|AN IMPLEMENTATION OF SWING MODULO SCHEDULING WITH EXTENSIONS FOR SUPERBLOCKS]]
        * Modulo Scheduling 試圖讓多個 iteration 並行，縮小 iteration 之間的啟動 (initiation) 間距，盡可能填滿 CPU pipeline。
          * 建立 Data Dependence Graph (DDG)，藉此計算兩個連續的 iteration 啟動之間所需的 cycle，此值稱為 MII (Minimum Initiation Interval)。以 MII 作為 II (Initiation Interval)) 的初始值，試著把所有指令放進 II 區間。如果不行，II 放寬一個 cycle，重複前一個步驟。
            * MII 取 Res (Resource constrained) MII 和 Rec (Recurrence constrained) MII 的最大值。
          * 最後將原來的 loop 重構成 prologue，kernel 和 epilogue。
        * Swing Modulo Scheduling (SMS) 另外考慮到 register pressure。如果 scheduling 過於激進，並行的指令使用超過硬件所能提供的 register，spilling 會發生，抵銷 scheduling 所拿到的效能增長。將相依賴的指令盡可能排在鄰近執行，減小 register 的生命週期，增加 register 的使用率。
      * [[http://llvm.org/devmtg/2015-04/slides/Roel-SWP-EuroLLVM2015.pdf|A high-level implementation of software
pipelining in LLVM]] ([[https://www.youtube.com/watch?v=zChslFrJDh4|Video]] [[https://pdfs.semanticscholar.org/61b1/e8ea1a2d0c22917532e2b8788de5ca7f464f.pdf|Paper]])
        * [[http://llvm.org/doxygen/MachinePipeliner_8cpp.html|MachinePipeliner.cpp]][(https://reviews.llvm.org/D16829)]

  * LibCall
    * [[http://llvm.org/doxygen/RuntimeLibcalls_8h.html|RuntimeLibcalls.h]]
    * [[https://gcc.gnu.org/onlinedocs/gccint/Libgcc.html|4 The GCC low-level runtime library]]
    * 用意在於如果產生目標平台不支援的指令 (和是否開啟 O2 沒有關係)，可以調用 libgcc 的實現 [(http://lists.llvm.org/pipermail/llvm-dev/2017-May/113156.html)]。以 [[http://llvm.org/doxygen/LegalizeIntegerTypes_8cpp_source.html|DAGTypeLegalizer::ExpandIntRes_Shift]] 為例 <code c++>
  // 如果之前的代碼都不能處理，看是否能調用 libcall。
  if (LC != RTLIB::UNKNOWN_LIBCALL && TLI.getLibcallName(LC)) {
    SDValue Ops[2] = { N->getOperand(0), N->getOperand(1) };
    SplitInteger(TLI.makeLibCall(DAG, LC, VT, Ops, isSigned, dl).first, Lo, Hi);
    return;
  }

  // 如果 LC 對應的函式名為 nullptr，嘗試 ExpandShiftWithUnknownAmountBit 是否能加以處理。
  if (!ExpandShiftWithUnknownAmountBit(N, Lo, Hi))
    llvm_unreachable("Unsupported shift!");
</code>
      * [[http://llvm.org/doxygen/classllvm_1_1HexagonTargetLowering.html#ab0f312a890f3aa47e480f40c67df30fe|HexagonTargetLowering::HexagonTargetLowering]] 可以看到部分 libcall 被設為 nullptr。

  * [[http://llvm.org/doxygen/MathExtras_8h_source.html|MathExtras.h]] 提供 helper function。例如: 判斷常量的二進制是否是連續的 1-bit。<code c++>
  if (countTrailingOnes(Imm) == countPopulation(Imm))
</code>

  * Peephole Optimizer
    * [[http://llvm.org/doxygen/PeepholeOptimizer_8cpp_source.html|PeepholeOptimizer.cpp]] 是通用的優化，目標平台透過實現 TargetInstrInfo 的函式供其調用。
    * [[http://llvm.org/doxygen/PPCMIPeephole_8cpp_source.html|PPCMIPeephole.cpp]] 是 PowerPC 特定的優化。 ''PPCMIPeephole::simplifyCode()'' 搜尋特定的 MachineInstr 加以優化。

  * [[https://llvm.org/devmtg/2017-02-04/Register-Data-Flow-Framework.pptx|Register Data Flow Framework]]
    * [[http://llvm.org/doxygen/structllvm_1_1rdf_1_1DataFlowGraph_1_1DefStack.html|RDFGraph.h]]
    * [[http://llvm.org/doxygen/RDFLiveness_8h_source.html|RDFLiveness.h]]
    * [[http://llvm.org/doxygen/RDFLiveness_8cpp_source.html|RDFLiveness.cpp]]
    * [[http://llvm.org/doxygen/RDFCopy_8cpp_source.html|RDFCopy.cpp]]
    * [[http://llvm.org/doxygen/RDFDeadCode_8cpp_source.html|RDFDeadCode.cpp]]
====== 交叉編譯 ======
<note>
  * [[GCC]] 也有相關資訊，可以考慮加以合併。
  * [[https://clang.llvm.org/docs/CrossCompilation.html|Cross-compilation using Clang]]
  * 參閱 [[https://www.amazon.com/Getting-Started-LLVM-Core-Libraries/dp/1782166920|Getting Started with LLVM Core Libraries]] 第八章。
</note>

交叉編譯有底下術語:
  * host: 運行交叉工具鏈的平台。
  * build: 編譯交叉工具鏈的平台。
  * target: 運行交叉工具鏈得到的執行檔，其運行的平台。
  * multilib: 在系統上同時安裝支援不同 ABI 的函式庫，讓交叉編譯加以鏈結。如: ''/lib/n32'' 和 ''/lib/n64'' 分別是支持 MIPS n32 和 n64 ABI 的函式庫。
  * sysroot: 交叉編譯時，從指定的根目錄搜尋所需的標頭檔或函式庫。

交叉工具鏈，除了編譯器，基本牽涉到底下幾個元件:
  * C 標準函式庫:
    * [[https://www.gnu.org/software/libc/|The GNU C Library (glibc)]]
  * C++ 標準函式庫:
    * [[https://gcc.gnu.org/onlinedocs/libstdc++/|The GNU C++ Library (libstdc++)]]
    * [[https://libcxx.llvm.org/|"libc++" C++ Standard Library]]
  * 運行時函式庫: 一般用來實現目標平台硬件不支持的操作。
    * [[https://gcc.gnu.org/onlinedocs/gccint/Libgcc.html|4 The GCC low-level runtime library (libgcc)]]
    * [[https://compiler-rt.llvm.org/|"compiler-rt" runtime libraries]]
  * 匯編和鏈結器:
    * [[https://www.gnu.org/software/binutils/|GNU Binutils]]
    * [[http://llvm.org/docs/CodeGenerator.html#the-mc-layer|The “MC” Layer]] 是 LLVM 負責實現內建匯編器的組件，[[https://lld.llvm.org/|LLD - The LLVM Linker]] 是 LLVM 側的鏈結器。

  * ''--target=<triple>'' ([[http://llvm.org/doxygen/Triple_8h_source.html|Triple.h]])
    * triple 格式為 <arch><sub>-<vendor>-<sys>-<abi>。例如: x86_64-apple-macosx10.11.0，<arch>-<vendor>-<sys>。
    * 底下選項用來更加精確的指定目標平台:
      * ''-march=<arch>'': 指定指令集架構版本。如: armv4t。此選項同時會設置預設的 CPU。
      * ''-mcpu=<cpu>'': 指定 CPU。如: cortex-a8。此選項同時會設置預設的 float-abi。
      * ''-mfloat-abi=<abi>'': 指定軟浮點或是硬浮點，以及 abi。
====== 測試 ======
詳細請見 [[http://llvm.org/docs/TestingGuide.html|LLVM Testing Infrastructure Guide]]。
  * [[http://llvm.org/docs/CommandGuide/FileCheck.html|FileCheck]]<code bash>
$ ./build.Ninja/bin/llvm-lit llvm/test/CodeGen/*
$ ./build.Ninja/bin/llvm-lit llvm/tools/clang/test/CodeGen/*
</code>
    * FileCheck，llvm-lit 這一類開發者使用的工具不會被裝在安裝目錄底下，都放在編譯目錄。
      * '';CHECK:'' 匹配指定的字串。多個 '';CHECK:'' 依序匹配。
      * '';CHECK:'' 和 '';CHECK-NEXT:'' 搭配可以匹配連續的字串。
      * '';CHECK-LABEL:''
        * [[http://llvm.org/docs/CommandGuide/FileCheck.html#the-check-label-directive|The “CHECK-LABEL:” directive]]
        * [[http://lists.llvm.org/pipermail/llvm-dev/2017-March/111657.html|[llvm-dev] CHECK-LABLE or CHECK?]]
        * FileCheck 會依據 '';CHECK-LABEL:'' 後的字串，把欲匹配的文本切成數個 block。每個 block 再應用各自的 '';CHECK:''。 
      * 可以使用 regular expression 表示匹配的字符串。
        * [[http://llvm.org/docs/CommandGuide/FileCheck.html#filecheck-pattern-matching-syntax|FileCheck Pattern Matching Syntax]]
      * 可以保存被匹配到字符串，供之後使用。
        * [[http://llvm.org/docs/CommandGuide/FileCheck.html#filecheck-variables|FileCheck Variables]]

發行版測試流程請見 [[http://llvm.org/docs/ReleaseProcess.html|How To Validate a New Release]]。一般在預定發行日期前一個月，會公告徵求測試者 ([[http://lists.llvm.org/pipermail/release-testers/2017-June/000423.html|[Release-testers] [5.0.0 Release] Schedule and call for testers]])。
  * [[http://lists.llvm.org/pipermail/release-testers/2017-June/000443.html|[Release-testers] LLVM 4.0.1 -final has been tagged]]

  * [[https://reviews.llvm.org/D42675#994628]]
    * 測試用例生成盡量使用 ''update_llc_test_checks.py''。
  * [[https://reviews.llvm.org/D42515#994726]]
====== 補丁 & 臭蟲 ======
臭蟲請發送到[[https://bugs.llvm.org/|這裡]] 並先閱讀 [[http://llvm.org/docs/HowToSubmitABug.html|How To Submit A Bug]]。目前發送補丁方式請參照 [[http://llvm.org/docs/Phabricator.html|Code Reviews with Phabricator]]。
  * ''git diff -U999999 master > b.patch''
  * 在 [[https://reviews.llvm.org/|Phabricator]] 開啟帳戶。目前以 GitHub 帳戶登入。
  * 可以選擇訂閱 [[http://lists.llvm.org/mailman/listinfo/llvm-commits|llvm-commits]] 郵件列表。
    * 透過 Phabricator 發送的 code review，最好將 llvm-commits 加入 subscriber list。code review 的主題前面加上被修改的模塊名稱，如: ''[Doc]''。
    * 確定登入帳戶的 email 有訂閱 llvm-commits。如果沒有的話，Phabricator 以此帳戶發送信件至 llvm-commits 會被 moderate (即受到管制，不能被馬上看到)。
    * 可以透過 Phabricator 網頁介面，或是命令行發送 patch。
    * 如果 patch 被 approved，但自己沒有 commit 權限，在 comment 上留言表示自己沒有權限，請他人幫忙 commit。

  * [[http://llvm.org/docs/DeveloperPolicy.html|LLVM Developer Policy]]
    * [[http://llvm.org/devmtg/2014-02/slides/ledru-how-to-contribute-to-llvm.pdf|How to contribute to LLVM, Clang, etc]]
====== 外部連結 ======
  * [[http://llvm.org/|The LLVM Compiler Infrastructure]]
  * [[http://llvm.org/devmtg/|LLVM Developers' Meeting]]