計畫網頁
* [[http://sites.google.com/site/justinholewinski/projects/gsoc/llvm-ptx-back-end-2011|GSoC 2011- LLVM PTX Back-End]]
* [[http://jholewinski.wordpress.com/2011/12/02/llvm-3-0-ptx-support/|LLVM 3.0: PTX Support]]
* [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/044108.html|[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces]]
* [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/044380.html|[LLVMdev] ANN: libclc (OpenCL C library implementation)]]
* [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/043794.html|[LLVMdev] TableGen and Greenspun]]
* [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/043818.html|[LLVMdev] Enhancing TableGen]]
* [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/044554.html|[LLVMdev] Function pointer parameters in PTX backend]]
* [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-November/045410.html|[LLVMdev] PTX builtin functions.]]
* [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-November/045516.html|Re: LLVMdev] PTX builtin functions.]]
* [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-December/045882.html|Re: [LLVMdev] PTX builtin functions.]]
* [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-November/044861.html|[LLVMdev] PTX backend support for atomics]]
* [[CUDA]]
====== 建置 LLVM ======
[[https://github.com/jholewinski/llvm-ptx-samples|Sample programs for the LLVM PTX back-end]]
- CMake and the NVidia CUDA toolkit
- Clang/LLVM which was built with the PTX back-end
- 編譯 LLVM 和 Clang。
$ ssh tesla
$ svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm
$ cd llvm/tools
$ svn co http://llvm.org/svn/llvm-project/clang/trunk clang
$ mkdir build; cd build
$ ../llvm/configure --prefix=$INSTALL --enable-targets=host,ptx
$ make
- 編譯 ptx sample。注意! 範例已用 OpenCL 改寫。請在機器上安裝 OpenCL。
$ git clone git://github.com/jholewinski/llvm-ptx-samples.git
$ cd llvm-ptx-samples
# 抓下 libclc
$ git submodule init && git submodule update
$ mkdir build
$ cd build
$ cmake ..
$ make
- 測試。請在執行檔所在目錄執行,否則會抓不到對映的 PTX 檔。
$ cd build/bin
$ ./ocl-blur2d
------------------------------
* Source Kernel
------------------------------
Number of Iterations: 16
Total Time: 7.05218 sec
Average Time: 0.440761 sec
------------------------------
* Binary Kernel
------------------------------
Number of Iterations: 16
Total Time: 7.05264 sec
Average Time: 0.44079 sec
* [[http://www.pcc.me.uk/pipermail/libclc-dev/2011-December/000007.html|[Libclc-dev] libclc/ptx-nvidiacl/lib/ miss source code]]
* [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-December/045983.html|[LLVMdev] Build PTX samples with LLVM/Clang/libclc]]
* [[http://sites.google.com/site/justinholewinski/projects/llvm-ptx-back-end|LLVM: PTX Back-End]]
* [[http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/PTX/|PTX Changelog]]
* [[http://www.pcc.me.uk/~peter/libclc/|libclc]]
* [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/044380.html|[LLVMdev] ANN: libclc (OpenCL C library implementation)]]
* [[http://llvm.org/docs/ExtendingLLVM.html|Extending LLVM: Adding instructions, intrinsics, types, etc.]]
* [[Git]]
* [[Clang]]
====== 測試 ======
詳細請見 [[http://llvm.org/docs/TestingGuide.html|LLVM Testing Infrastructure Guide]]。
$ cd build
$ ./Debug+Asserts/bin/llvm-lit ${LLVM_SOURCE}/test/test/CodeGen/PTX/xxx.ll
====== 開發手冊 ======
* 執行以下指令可得知 PTX 版本演進。
$ cd llvm/lib/Target/PTX
$ svn log
* PTXSubtarget.h 裡面的 Shader Model 代表 CUDA 定義的 compute capability。
SM 1.0 ~ G80
SM 1.3 ~ G200
SM 2.0 ~ GF100
SM 2.1 ~ GF10x
===== TODO =====
* Add subtarget PTX23
* Add .address_size support
* Add ftz support
* [[wp>Normal number (computing)]]
* [[wp>Denormal number]]
* [[http://llvm.org/viewvc/llvm-project?view=rev&revision=133253]]
* add, sub, mul, fma. mad, div, abs, neg, min, max, rcp, rcp.approx.ftz.f64, sqrt, rsqrt, sin, cos, lg2, ex2
* set, setp, slct
* cvt
* Generate rcp instruction
* [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-May/040230.html]]
===== lib/Target/PTX =====
- PTXTargetMachine.* : 為 ''include/llvm/Target/TargetMachine.h'' 中 LLVMTargetMachine 的子類。
- PTXRegisterInfo.* : 定義目標平台暫存器。
- PTXInstrInfo.* 和 PTXInstrFormats.td : 定義目標平台指令集。
- PTXISelDAGToDAG.* 和 PTXISelLowering.* : 如何從 DAG 轉成目標平台指令。
- PTXAsmPrinter.* : 輸出目標平台匯編代碼。
- PTXSubtarget.* : 目標平台變種。
- PTXMFInfoExtract.cpp 和 PTXFPRoundingModePass.cpp : 有些情況需要寫額外的 pass。
* [[http://llvm.org/docs/WritingAnLLVMBackend.html#RegisterSet|Register Set and Register Classes]]
* [[http://llvm.org/docs/WritingAnLLVMBackend.html#InstructionSet|Instruction Set]]
* [[http://llvm.org/docs/CodeGenerator.html#codegendesc|Machine code description classes]]
* PTXMachineFunctionInfo.*
* [[http://llvm.org/docs/WritingAnLLVMBackend.html|Writing an LLVM Compiler Backend]]
* [[http://llvm.org/docs/CodeGenerator.html|The LLVM Target-Independent Code Generator]]
* [[http://llvm.org/docs/TableGenFundamentals.html|TableGen Fundamentals]]
===== InstrInfo.td =====
請參考 [[http://llvm.org/devmtg/2009-10/Korobeynikov_BackendTutorial.pdf|Tutorial: Building a backend in 24 hours]] 21 頁。
def ii64 : InstPTX<(outs RC:$d),
(ins MEMii64:$a),
!strconcat(opstr, !strconcat(typestr, "\t$d, [$a]")),
[(set RC:$d, (pat_load ADDRii64:$a))]>, Requires<[Use64BitAddresses]>;
詳細格式請見 [[http://llvm.org/docs/TableGenFundamentals.html|TableGen Fundamentals]]。''outs'' 代表該指令的結果要存到哪,''ins'' 代表該指令的運算元。''strconcat'' 用來將後面字串連結起來 ([[http://llvm.org/docs/TableGenFundamentals.html#values|TableGen values and expressions]])。
如果要新增指令,請參考 ''include/llvm/Target/TargetSelectionDAG.td''。該檔定義指令是使用 SDNode 或是 PatFrag[(http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-May/039979.html)]。
不同版本硬體提供不同特性是使用 [[http://llvm.org/docs/WritingAnLLVMBackend.html#subtargetSupport|Subtarget Support]] 描述。''SubtargetFeature'' 定義在 ''include/llvm/Target/Target.td''。
def FeaturePTX23 : SubtargetFeature<"ptx23", "PTXVersion", "PTX_VERSION_2_3",
"Use PTX Language Version 2.3",
[FeaturePTX22]>;
FMA (Fused-Multiply Add) 會有 ulp rounding error ([[wp>Unit in the last place]])。
def FeatureFMA : SubtargetFeature<"fma","SupportsFMA", "true",
"Support Fused-Multiply Add">;
LLVM 不直接支援 ''not %x'',改用 ''xor %x, 1''。為避免把 ''xor %x, 1'' 對應到 ''xor'',需要 custom lowering,參考 [[http://llvm.org/doxygen/X86ISelLowering_8cpp_source.html|X86ISelLowering.cpp]]。
====== Q & A ======
- [[http://old.nabble.com/predicates-and-conditional-execution-td31687908.html|[LLVMdev] predicates and conditional execution]]
- [[http://old.nabble.com/TargetRegisterInfo-and-%22infinite%22-register-files-td31629383.html|TargetRegisterInfo and "infinite" register files]]
====== Submitted Patch ======
* Add subtarget PTX23
* [[http://llvm.org/viewvc/llvm-project?view=rev&revision=130980]]
* [[http://llvm.org/viewvc/llvm-project?view=rev&revision=131123]]
* Add .address_size directive
* [[http://llvm.org/viewvc/llvm-project?rev=133589&view=rev]]
* [[http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20110502/120540.html|[llvm-commits] Patch for review - subtarget ptx23]]
* [[http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20110613/122533.html|[llvm-commits] [PATCH][Target/PTX] Add address_size directive to PTX backend]]
====== 外部連結 ======
* [[http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/ptx_isa_3.0.pdf|PTX: Parallel Thread Execution ISA Version 3.0]]
* [[http://developer.nvidia.com/cuda-toolkit-40|CUDA Toolkit 4.0]]
* [[http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/ptx_isa_2.3.pdf|PTX ISA 2.3]]
* [[http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf|NVIDIA CUDA C Programming Guide]]
* [[http://www.khronos.org/opencl/|OpenCL]]
* [[http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf|The OpenCL Specification 1.2]]
* [[http://developer.nvidia.com/opencl|Nvidia - OpenCL]]
* [[http://sourceforge.net/projects/llvmptxbackend/|PTX Backend for LLVM]]
* [[http://ncu.dl.sourceforge.net/project/llvmptxbackend/Rhodin_PTXBachelorThesis.pdf|A PTX Code Generator for LLVM]]
* [[http://code.google.com/p/gpuocelot/|Ocelot]]