計畫網頁 * [[http://sites.google.com/site/justinholewinski/projects/gsoc/llvm-ptx-back-end-2011|GSoC 2011- LLVM PTX Back-End]] * [[http://jholewinski.wordpress.com/2011/12/02/llvm-3-0-ptx-support/|LLVM 3.0: PTX Support]] * [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/044108.html|[LLVMdev] [cfe-dev] RFC: Representation of OpenCL Memory Spaces]] * [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/044380.html|[LLVMdev] ANN: libclc (OpenCL C library implementation)]] * [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/043794.html|[LLVMdev] TableGen and Greenspun]] * [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/043818.html|[LLVMdev] Enhancing TableGen]] * [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/044554.html|[LLVMdev] Function pointer parameters in PTX backend]] * [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-November/045410.html|[LLVMdev] PTX builtin functions.]] * [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-November/045516.html|Re: LLVMdev] PTX builtin functions.]] * [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-December/045882.html|Re: [LLVMdev] PTX builtin functions.]] * [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-November/044861.html|[LLVMdev] PTX backend support for atomics]] * [[CUDA]] ====== 建置 LLVM ====== [[https://github.com/jholewinski/llvm-ptx-samples|Sample programs for the LLVM PTX back-end]] - CMake and the NVidia CUDA toolkit - Clang/LLVM which was built with the PTX back-end - 編譯 LLVM 和 Clang。 $ ssh tesla $ svn co http://llvm.org/svn/llvm-project/llvm/trunk llvm $ cd llvm/tools $ svn co http://llvm.org/svn/llvm-project/clang/trunk clang $ mkdir build; cd build $ ../llvm/configure --prefix=$INSTALL --enable-targets=host,ptx $ make - 編譯 ptx sample。注意! 範例已用 OpenCL 改寫。請在機器上安裝 OpenCL。 $ git clone git://github.com/jholewinski/llvm-ptx-samples.git $ cd llvm-ptx-samples # 抓下 libclc $ git submodule init && git submodule update $ mkdir build $ cd build $ cmake .. $ make - 測試。請在執行檔所在目錄執行,否則會抓不到對映的 PTX 檔。 $ cd build/bin $ ./ocl-blur2d ------------------------------ * Source Kernel ------------------------------ Number of Iterations: 16 Total Time: 7.05218 sec Average Time: 0.440761 sec ------------------------------ * Binary Kernel ------------------------------ Number of Iterations: 16 Total Time: 7.05264 sec Average Time: 0.44079 sec * [[http://www.pcc.me.uk/pipermail/libclc-dev/2011-December/000007.html|[Libclc-dev] libclc/ptx-nvidiacl/lib/ miss source code]] * [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-December/045983.html|[LLVMdev] Build PTX samples with LLVM/Clang/libclc]] * [[http://sites.google.com/site/justinholewinski/projects/llvm-ptx-back-end|LLVM: PTX Back-End]] * [[http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/PTX/|PTX Changelog]] * [[http://www.pcc.me.uk/~peter/libclc/|libclc]] * [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/044380.html|[LLVMdev] ANN: libclc (OpenCL C library implementation)]] * [[http://llvm.org/docs/ExtendingLLVM.html|Extending LLVM: Adding instructions, intrinsics, types, etc.]] * [[Git]] * [[Clang]] ====== 測試 ====== 詳細請見 [[http://llvm.org/docs/TestingGuide.html|LLVM Testing Infrastructure Guide]]。 $ cd build $ ./Debug+Asserts/bin/llvm-lit ${LLVM_SOURCE}/test/test/CodeGen/PTX/xxx.ll ====== 開發手冊 ====== * 執行以下指令可得知 PTX 版本演進。 $ cd llvm/lib/Target/PTX $ svn log * PTXSubtarget.h 裡面的 Shader Model 代表 CUDA 定義的 compute capability。 SM 1.0 ~ G80 SM 1.3 ~ G200 SM 2.0 ~ GF100 SM 2.1 ~ GF10x ===== TODO ===== * Add subtarget PTX23 * Add .address_size support * Add ftz support * [[wp>Normal number (computing)]] * [[wp>Denormal number]] * [[http://llvm.org/viewvc/llvm-project?view=rev&revision=133253]] * add, sub, mul, fma. mad, div, abs, neg, min, max, rcp, rcp.approx.ftz.f64, sqrt, rsqrt, sin, cos, lg2, ex2 * set, setp, slct * cvt * Generate rcp instruction * [[http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-May/040230.html]] ===== lib/Target/PTX ===== - PTXTargetMachine.* : 為 ''include/llvm/Target/TargetMachine.h'' 中 LLVMTargetMachine 的子類。 - PTXRegisterInfo.* : 定義目標平台暫存器。 - PTXInstrInfo.* 和 PTXInstrFormats.td : 定義目標平台指令集。 - PTXISelDAGToDAG.* 和 PTXISelLowering.* : 如何從 DAG 轉成目標平台指令。 - PTXAsmPrinter.* : 輸出目標平台匯編代碼。 - PTXSubtarget.* : 目標平台變種。 - PTXMFInfoExtract.cpp 和 PTXFPRoundingModePass.cpp : 有些情況需要寫額外的 pass。 * [[http://llvm.org/docs/WritingAnLLVMBackend.html#RegisterSet|Register Set and Register Classes]] * [[http://llvm.org/docs/WritingAnLLVMBackend.html#InstructionSet|Instruction Set]] * [[http://llvm.org/docs/CodeGenerator.html#codegendesc|Machine code description classes]] * PTXMachineFunctionInfo.* * [[http://llvm.org/docs/WritingAnLLVMBackend.html|Writing an LLVM Compiler Backend]] * [[http://llvm.org/docs/CodeGenerator.html|The LLVM Target-Independent Code Generator]] * [[http://llvm.org/docs/TableGenFundamentals.html|TableGen Fundamentals]] ===== InstrInfo.td ===== 請參考 [[http://llvm.org/devmtg/2009-10/Korobeynikov_BackendTutorial.pdf|Tutorial: Building a backend in 24 hours]] 21 頁。 def ii64 : InstPTX<(outs RC:$d), (ins MEMii64:$a), !strconcat(opstr, !strconcat(typestr, "\t$d, [$a]")), [(set RC:$d, (pat_load ADDRii64:$a))]>, Requires<[Use64BitAddresses]>; 詳細格式請見 [[http://llvm.org/docs/TableGenFundamentals.html|TableGen Fundamentals]]。''outs'' 代表該指令的結果要存到哪,''ins'' 代表該指令的運算元。''strconcat'' 用來將後面字串連結起來 ([[http://llvm.org/docs/TableGenFundamentals.html#values|TableGen values and expressions]])。 如果要新增指令,請參考 ''include/llvm/Target/TargetSelectionDAG.td''。該檔定義指令是使用 SDNode 或是 PatFrag[(http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-May/039979.html)]。 不同版本硬體提供不同特性是使用 [[http://llvm.org/docs/WritingAnLLVMBackend.html#subtargetSupport|Subtarget Support]] 描述。''SubtargetFeature'' 定義在 ''include/llvm/Target/Target.td''。 def FeaturePTX23 : SubtargetFeature<"ptx23", "PTXVersion", "PTX_VERSION_2_3", "Use PTX Language Version 2.3", [FeaturePTX22]>; FMA (Fused-Multiply Add) 會有 ulp rounding error ([[wp>Unit in the last place]])。 def FeatureFMA : SubtargetFeature<"fma","SupportsFMA", "true", "Support Fused-Multiply Add">; LLVM 不直接支援 ''not %x'',改用 ''xor %x, 1''。為避免把 ''xor %x, 1'' 對應到 ''xor'',需要 custom lowering,參考 [[http://llvm.org/doxygen/X86ISelLowering_8cpp_source.html|X86ISelLowering.cpp]]。 ====== Q & A ====== - [[http://old.nabble.com/predicates-and-conditional-execution-td31687908.html|[LLVMdev] predicates and conditional execution]] - [[http://old.nabble.com/TargetRegisterInfo-and-%22infinite%22-register-files-td31629383.html|TargetRegisterInfo and "infinite" register files]] ====== Submitted Patch ====== * Add subtarget PTX23 * [[http://llvm.org/viewvc/llvm-project?view=rev&revision=130980]] * [[http://llvm.org/viewvc/llvm-project?view=rev&revision=131123]] * Add .address_size directive * [[http://llvm.org/viewvc/llvm-project?rev=133589&view=rev]] * [[http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20110502/120540.html|[llvm-commits] Patch for review - subtarget ptx23]] * [[http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20110613/122533.html|[llvm-commits] [PATCH][Target/PTX] Add address_size directive to PTX backend]] ====== 外部連結 ====== * [[http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/ptx_isa_3.0.pdf|PTX: Parallel Thread Execution ISA Version 3.0]] * [[http://developer.nvidia.com/cuda-toolkit-40|CUDA Toolkit 4.0]] * [[http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/ptx_isa_2.3.pdf|PTX ISA 2.3]] * [[http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf|NVIDIA CUDA C Programming Guide]] * [[http://www.khronos.org/opencl/|OpenCL]] * [[http://www.khronos.org/registry/cl/specs/opencl-1.2.pdf|The OpenCL Specification 1.2]] * [[http://developer.nvidia.com/opencl|Nvidia - OpenCL]] * [[http://sourceforge.net/projects/llvmptxbackend/|PTX Backend for LLVM]] * [[http://ncu.dl.sourceforge.net/project/llvmptxbackend/Rhodin_PTXBachelorThesis.pdf|A PTX Code Generator for LLVM]] * [[http://code.google.com/p/gpuocelot/|Ocelot]]