x86 中所稱的 byte,word 和 double word 分別為 8、16 和 32 位。 * [[http://www.tektalk.org/2010/03/11/%e5%8f%8c%e8%8a%af%e8%ae%b0a-tale-of-two-chips/|双芯记(A Tale of Two Chips)]] * [[wp>Intel iAPX 432]]。[(http://people.cs.nctu.edu.tw/~chenwj/log/LLVM/majkl-2011-12-08.txt)] * [[http://www.tektalk.org/2010/01/02/%e9%93%81%e8%87%82%e9%98%bf%e7%ab%a5%e6%9c%a8%e2%80%94%e2%80%94intel-atom%e5%a4%84%e7%90%86%e5%99%a8%e5%89%96%e6%9e%90%e4%b8%8e%e7%a0%94%e7%a9%b6/|铁臂阿童木——Intel ATOM处理器剖析与研究]] * [[http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-software-developer-vol-1-manual.pdf|Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 1: Basic Architecture]] * [[http://bbs.chinaunix.net/thread-2012474-1-1.html|【x64 指令系统】之 指令编码内幕]] * [[http://www.mouseos.com/x64/index.html|x86/x64 指令编码内幕(适用于 AMD/Intel)]] * [[http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-software-developer-vol-2a-manual.pdf|Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2A: Instruction Set Reference, A-M]] * [[http://blog.csdn.net/misterliwei/article/details/5550452|保护模式、实地址模式及V8086模式下的指令格式(上)]] * [[http://blog.csdn.net/misterliwei/article/details/3951103|PAUSE指令]] * [[wp>x86 assembly language]] * no-temproal instruciton * [[http://stackoverflow.com/questions/37070/what-is-the-meaning-of-non-temporal-memory-accesses-in-x86|What is the meaning of "non temporal" memory accesses in x86]] * [[http://www.rz.uni-karlsruhe.de/rz/docs/VTune/reference/vc198.htm]] * [[http://groups.google.com/group/comp.lang.asm.x86/browse_thread/thread/2ae6c66f8e69ae82/1950f09a79cd4056]] x86_64 地址線只有 48 bit 而非 64 bit。[[wp>x86-64]] *[[http://blog.csdn.net/muxiqingyang/article/details/6791218|《大话处理器》连载——微架构(22) Superscalar处理器实例——Intel P4 CPU]] * [[http://www.ithome.com.tw/itadm/article.php?c=38381&s=1|深度剖析英特爾旗艦處理器-走出隧道盡頭的Itanium]] * [[http://www.diybl.com/course/3_program/hb/hbjs/2007124/89933.html|从X86指令RET和CALL的意义看进程的自由切换]] * [[http://people.cs.nctu.edu.tw/~huangmc/works/web/Boot_x86/Boot_x86.html|X86 開機流程小記]] * [[http://henbin.blogspot.com/2009/02/x86-reset-concept.html|X86 Reset concept]] MMIO 需要做位址轉換。 * [[http://www.embexperts.com/viewthread.php?tid=65|X86 IO端口和MMIO]] A20 地址線只要是為了向後相容,將存取到 1MB 以上的位址改以從 0 開始。 * [[wp>DOS memory management]] * [[http://software.intel.com/en-us/blogs/2012/02/07/transactional-synchronization-in-haswell/|Transactional Synchronization in Haswell]] * [[http://www.pagetable.com/?p=364|Why is there no CR1 – and why are control registers such a mess anyway?]] ====== CPU ====== 實模式: 不做任何位址轉換,不開分段也不開分頁。识别 processor 是否处于 real mode 的唯一途径就是判断 CR0.PE 标志位。 * [[http://www.mouseos.com/arch/realmode.html|再论 real mode]] * [[wp>Real mode]] 非實模式: 開機即處於非實模式。之後依序切入實模式、保護模式,再切回實模式。在此過程中,於保護模式將段限長設成 4GB,再切回實模式。但因為此時段限長已為 4GB,並不符實模式的嚴格定義。非實模式是實模式的變種,並非另一種模式,其目的是用來存取實模式存取不到的位址 (1MB)。 * [[http://www.mouseos.com/arch/unreal.html|理解 unreal mode]] * [[wp>Unreal mode]] 保護模式: 開啟分段。 * [[http://www.mouseos.com/arch/mechanism.html|protected mode 机制概述]] * [[wp>Protected mode]] * [[http://www.mouseos.com/arch/processor_mode.html|processor 模式的切换]] ====== 內存 ====== 虛擬 (邏輯) 地址 (virtual/logical address) -> 線性地址 (linear address) -> 物理地址 (physical address) - 虛擬地址 (virtual address) -> 線性地址 (linear address) 使用段機制 (segment) 進行轉換,段機制無法關閉。一般會將段基址設為零,如此一來,虛擬地址等同線性地址。此種作法稱為平坦內存模型 ([[wp>Flat memory model]])。段機制所做的轉換在實模式 ([[wp>Real mode|real mode]]) 和保護模式下 ([[wp>Protected mode|protected mode]]) 有所不同。前者的段暫存器中存的是地址,後者的段暫存器中存的則是用來存取段描述符表的索引。 - 線性地址 (linear address) -> 物理地址 (physical address) 使用頁機制 (paging) 進行轉換,頁機制可被關閉。 段描述符表分為全局描述符表 (global descriptor table) 和局部描述符表 (local descriptor table)。分別簡稱為 GDT 和 LDT。存放 GDT 和 LDT 位址的暫存器分別稱為 GDTR 和 LDTR。描述符的內容會被隱式的快取在段暫存器中不可見的部分。所有任務共享一份 GDT,另外保有自己一份 LDT。LDT 本身為一個段,因此在 GDT 中有相對應的描述符指到該 LDT。 * [[wp>Real mode]] * [[http://www.mouseos.com/arch/001.html|再论 real mode]] * [[wp>Unreal mode]] * [[http://mirror.transact.net.au/sourceforge/h/project/hw/hwtools/Documents/Introduction_to_Big_Real_Mode/Introduction_to_Big_Real_Mode_CHT_.pdf|Big Real Mode]] * [[http://duartes.org/gustavo/blog/post/memory-translation-and-segmentation|Memory Translation and Segmentation]] * [[http://linux.linti.unlp.edu.ar/images/5/50/Ulk3-cap2.pdf|Chapter 2. Memory Addressing]] ===== 分段 =====
Segmentation provides a mechanism of isolating individual code, data, and stack modules so that multiple programs (or tasks) can run on the same processor without interfering with one another. Paging provides a mechanism for implementing a conventional demand-paged, virtual-memory system where sections of a program’s execution environment are mapped into physical memory as needed. Paging can also be used to provide isolation between multiple tasks. When operating in protected mode, some form of segmentation must be used. There is no mode bit to disable segmentation. The use of paging, however, is optional.
邏輯地址 (logical address) 分為兩部分: 16 位的段選擇符 (segement selector) 和 32 位的偏移 (offset)。 - 用段選擇符索引段描述符表,取得段描述符。段選擇符包含: 索引、TI (索引 GDT 或是 LDT) 和 RPL。判定是否有權限存取該段,是取 CPL 和 RPL 中 - 將段描述符中的段基址和偏移相加,得到線性位址。 段暫存器: CS、DS、ES、FS、GS、SS。 * [[http://www.csie.ntu.edu.tw/~wcchen/asm98/asm/proj/b85506061/chap2/segment.html|分段架構]] * [[http://www.mouseos.com/arch/segmentation.html|segmentation 情景分析(上篇)--- 数据结构]] * [[http://www.mouseos.com/arch/segmentation_protected.html|segmentation 情景分析(下篇)--- protected 机制]] ===== 分頁 ===== * [[http://www.mouseos.com/arch/paging.html|理解 paging]] * [[http://www.mouseos.com/arch/page-protected.html|page 的保护措施]] * [[http://www.ibm.com/developerworks/cn/linux/l-lvm64/index.html|X86-64 上的 Linux VM 管理系统]] 頁表項有底下幾項控制位: * G: * D: * A: * C: * W: * U/S: 決定屬於用戶態或是內核態,用戶態無法存取屬於內核態的頁。 * R/W: 決定是否可寫。 * Present (P): 該頁存在物理內存。 * [[http://stackoverflow.com/questions/10671147/how-do-x86-page-tables-work|How do x86 page tables work?]] * [[http://lwn.net/Articles/106177/|Four-level page tables]] * [[http://wiki.osdev.org/Paging|Paging]] * [[http://download.intel.com/products/processor/manual/253668.pdf|Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 3A: System Programming Guide, Part 1]] Chapter 4
Software enables paging by using the MOV to CR0 instruction to set CR0.PG. Before doing so, software should ensure that control register CR3 contains the physical address of the first paging structure that the processor will use for linear-address translation (see Section 4.2) and that structure is initialized as desired.
* 32-bit paging: CR0.PG = 1 and CR4.PAE = 0 * PAE paging: CR0.PG = 1, CR4.PAE = 1, and IA32_EFER.LME = 0 * IA-32e paging: CR0.PG = 1, CR4.PAE = 1, and IA32_EFER.LME = 1 * CR0.WP * CR4.PSE * CR4.PGE * CR4.PCIDE * CR4.SMEP * IA32_EFER.NXE
The first paging structure used for any translation is located at the physical address in CR3.
* [[http://electronics.stackexchange.com/questions/21469/are-page-table-walks-cached|Are page table walks cached?]] * [[http://www.cs.rice.edu/CS/Architecture/docs/barr-isca10.pdf|Translation Caching: Skip, Don't Walk (the Page Table)]] ===== 術語 ===== * Page Global Directory (PGD) * Page Middle directory (PMD) * Page Table Entry (PTE) * Page-Map Level 4 (PML4) * Page-directory-pointer table (PDPT) * [[wp>Physical Address Extension|Physical Address Extension (PAE)]] * [[wp>Page attribute table|Page Attribute Table (PAT)]] * [[wp>Memory type range register|Memory Type Range Register (MTRR)]] ====== 術語 ====== * Current Privilege Level (CPL) * Descriptor Privilege Level (DPL) * Request Privilege Level (RPL) * Global Descriptor Table (GDT) ====== MMX & SSE ====== MMX -> SSE -> AVX * [[http://www.mobile01.com/topicdetail.php?f=296&t=367606&last=3254788|1-3.CPU進階技術講解,XD、VT、SSE在幹嘛]] * [[http://en.wikibooks.org/wiki/X86_Assembly/SSE|X86 Assembly/SSE]] * [[http://neilkemp.us/src/sse_tutorial/sse_tutorial.html|Intel SSE Tutorial : An Introduction to the SSE Instruction Set]] * [[https://developer.apple.com/hardwaredrivers/ve/sse.html]] * [[http://saluc.engr.uconn.edu/refs/processors/intel/sse_sse2.pdf|Using SSE and SSE2: Misconcepts and Reality]] * [[http://www.formboss.net/blog/2010/10/sse-intrinsics-tutorial/]] * [[http://msdn.microsoft.com/en-us/library/y0dh78ez(v=VS.80).aspx]] * 2.3 * 16 128-bit register (xmm0 - xmm15) * [[wp>Streaming SIMD Extensions]] * PD: Packed Double * PS: Packed Single * SD: Scalar Double * SS: Scalar Single * [[wp>x86 instruction listings]] * 3.1.1.3 Instruction Column in the Opcode Summary Table * r/m32 — A doubleword general-purpose register or memory operand used for instructions whose operandsize attribute is 32 bits. The doubleword general-purpose registers are: EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI. The contents of memory are found at the address provided by the effective address computation. Doubleword registers R8D - R15D are available when using REX.R in 64-bit mode * mm — An MMX register. The 64-bit MMX registers are: MM0 through MM7. * [[http://stackoverflow.com/questions/7115795/sse-instructions-in-a-buffer|SSE instructions in a buffer]] $ gcc -O2 -ftree-vectorize -msse2 -ftree-vectorizer-verbose=5 * [[http://stackoverflow.com/questions/409300/how-to-vectorize-with-gcc|How to vectorize with gcc?]] * [[http://gcc.gnu.org/onlinedocs/gcc-4.8.0/gcc/Optimize-Options.html#Optimize-Options|3.10 Options That Control Optimization]] * `-ftree-vectorize` * Perform loop vectorization on trees. This flag is enabled by default at -O3. * [[http://gcc.gnu.org/onlinedocs/gcc-4.8.0/gcc/i386-and-x86_002d64-Options.html#i386-and-x86_002d64-Options|3.17.16 Intel 386 and AMD x86-64 Options]] * `-msse2` * These switches enable or disable the use of instructions in the MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, AVX, AVX2, AES, PCLMUL, FSGSBASE, RDRND, F16C, FMA, SSE4A, FMA4, XOP, LWP, ABM, BMI, BMI2, LZCNT, RTM or 3DNow! extended instruction sets. * Using GCC Auto-Vectorizer $ gcc -mfpu=neon -mfloat-abi=softfp or -mfloat-abi=hard * [[http://locklessinc.com/articles/vectorize/|Auto-vectorization with gcc 4.7]] * [[http://stackoverflow.com/questions/11129159/auto-vectorization-in-llvm|Auto vectorization in llvm]] ====== 其它 ====== * [[wp>Time Stamp Counter]] * [[http://download.intel.com/embedded/software/IA/324264.pdf|How to Benchmark Code Execution Times on Intel® IA-32 and IA-64 Instruction Set Architectures]] * [[http://www.ccsl.carleton.ca/~jamuir/rdtscpm1.pdf|Using the RDTSC Instruction for Performance Monitoring]] * [[http://blog.csdn.net/solstice/article/details/5196544|多核时代不宜再用 x86 的 RDTSC 指令测试指令周期和时间]] * [[http://blog.gslin.org/archives/2012/03/09/2847/amd-cpu-bug-%e5%95%8f%e9%a1%8c/|AMD CPU bug 問題…]] * [[http://leaf.dragonflybsd.org/mailarchive/kernel/2011-12/msg00025.html|Buildworld loop seg-fault update -- I believe it is hardware]] * [[http://leaf.dragonflybsd.org/mailarchive/kernel/2012-03/msg00006.html|AMD cpu bug update -- AMD confirms! (additional info) ]] * [[http://newsletter.sigmicro.org/sigmicro-oral-history-transcripts/Bob-Colwell-Transcript.pdf|Oral history of Robert P. Colwell (1954- )]] ====== 外部連結 ====== * [[http://www.mouseos.com/index.html|mouseOS 技术小站]] * [[http://www.csie.ntu.edu.tw/~wcchen/asm98/asm/proj/b85506061/cover.html|Intel Architecture 保護模式架構]] * [[http://www.cs.cmu.edu/~fp/courses/15213-s07/misc/asm64-handout.pdf|x86-64 Machine-Level Programming]] * [[http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html|Intel® 64 and IA-32 Architectures Software Developer Manuals]] * [[http://developer.amd.com/pages/default.aspx|AMD Developer Central]]