* [[http://www.cs.nctu.edu.tw/~chenwj/Master%20Thesis/IEEEtran_CHENWJ.pdf|Lock-free cache-friendly STL-compliant queue for decoupled software pipelining]] * [[http://preshing.com/20120625/memory-ordering-at-compile-time|Memory Ordering at Compile Time]] * [[http://www.kernel.org/pub/linux/kernel/people/geoff/cell/ps3-linux-docs/ps3-linux-docs-latest/CellProgrammingPrimer.html|Cell Programming Primer]] ====== 同步原語 ====== * [[wp>Fetch-and-add]] * [[wp>Compare-and-swap]] * [[wp>ABA problem]] * [[wp>Double compare-and-swap]] * [[wp>Load-link/store-conditional]]
In multithreaded computing, the ABA problem occurs during synchronization, when a location is read twice, has the same value for both reads, and "value is the same" is used to indicate "nothing has changed". However, another thread can execute between the two reads and change the value, do other work, then change the value back, thus fooling the first thread in to thinking "nothing has changed" even though the second thread did work that violates that assumption. http://en.wikipedia.org/wiki/ABA_problem
* [[http://forums.arm.com/index.php?/topic/14418-atomic-operations/|atomic operations]] * [[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/14979.html|Can the ARM compiler generate exclusive loads and stores (SWP, LDREX, STREX)?]] * [[http://knowthycomputer.blogspot.tw/2008/12/what-arm-instruction-helps-atomic.html|What ARM Instruction Helps Atomic Operations?]] * [[http://www.ibm.com/developerworks/library/pa-atom/|Save your code from meltdown using PowerPC atomic instructions]] * [[http://developers.sun.com/solaris/articles/atomic_sparc/|Atomic SPARC: Using the SPARC Atomic Instructions]] * [[https://blogs.oracle.com/d/entry/atomic_operations|Atomic operations]] * RISC 機器,如 ARM、PowerPC 和 SPARC,不支援 ''fetch-and-add'' 這一類直接讀寫內存的同步原語。 * [[http://stackoverflow.com/questions/12310756/any-architecture-except-x86-support-fetch-and-add-instruction|Any architecture except x86 support fetch-and-add instruction?]] ===== x86 ===== * [[http://stackoverflow.com/questions/3343589/how-do-i-use-the-lock-asm-prefix-to-read-a-value|How do I use the LOCK ASM prefix to read a value?]] * [[http://stackoverflow.com/questions/3339141/x86-lock-question-on-multi-core-cpus|x86 LOCK question on multi-core CPUs]] ====== 锁无关的数据结构 ====== * [[http://dl.acm.org/citation.cfm?id=248052.248106|Simple, fast, and practical non-blocking and blocking concurrent queue algorithms]] * MS-queue (因作者而得名)。相當多論文都是基於此做改進。 * [[http://www.springerlink.com/content/h84dfexjftdal4p4/|An Optimistic Approach to Lock-Free FIFO Queues]] * [[http://dl.acm.org/citation.cfm?id=1074013|Using elimination to implement scalable and lock-free FIFO queues]] * [[http://dl.acm.org/citation.cfm?id=1810479.1810540|Flat Combining and the Synchronization-Parallelism Tradeoff]] * [[http://www.siat.ac.cn/xscbw/xsqk/200906/W020091127490148897561.pdf|多线程并发访问无锁队列的算法研究]] * 消隱即 elimination。 一般證明鎖無關數據結構,會證明該演算法具有 [[wp>Linearizability]] 且為 [[wp>Non-blocking algorithm]]; 前者出自於 [[http://dl.acm.org/citation.cfm?id=78972|Linearizability: a correctness condition for concurrent objects]],後者出自於 [[http://dl.acm.org/citation.cfm?id=102808|Wait-free synchronization]] 兩篇論文。
Specifically, linearizability guarantees that the invariants of a system are observed and preserved by all operations: if all operations individually preserve an invariant, the system as a whole will. A history is a sequence of invocations and responses made of an object by a set of threads. Each invocation of a function will have a subsequent response. This can be used to model any use of an object. Suppose, for example, that two threads, A and B, both attempt to grab a lock, backing off if it's already taken. This would be modeled as both threads invoking the lock operation, then both threads receiving a response, one successful, one not. ... snip ... A sequential history is one in which all invocations have immediate responses. A history is linearizable if: - its invocations and responses can be reordered to yield a sequential history - that sequential history is correct according to the sequential definition of the object - if a response preceded an invocation in the original history, it must still precede it in the sequential reordering http://en.wikipedia.org/wiki/Linearizability
A concurrent object is a data object shared by concurrent processes. Linearizability is a correctness condition for concurrent objects that exploits the semantics of abstract data types. Linearizability: a correctness condition for concurrent objects
* 物件 (object) 可以是鎖或是資料結構,有多個線程同時存取。
In computer science, a non-blocking algorithm ensures that threads competing for a shared resource do not have their execution indefinitely postponed by mutual exclusion. A non-blocking algorithm is lock-free if there is guaranteed system-wide progress; wait-free if there is also guaranteed per-thread progress. ... snip ... Wait-freedom is the strongest non-blocking guarantee of progress, combining guaranteed system-wide throughput with starvation-freedom. ... snip ... Lock-freedom allows individual threads to starve but guarantees system-wide throughput. An algorithm is lock-free if it satisfies that when the program threads are run sufficiently long at least one of the threads makes progress (for some sensible definition of progress). All wait-free algorithms are lock-free.
* wait-free 條件較 lock-free 更強,wait-free 進一步保證不會有線程處於飢餓狀態。
A deadlock is a situation in which two or more competing actions are each waiting for the other to finish, and thus neither ever does. ... snip ... A livelock is similar to a deadlock, except that the states of the processes involved in the livelock constantly change with regard to one another, none progressing. Livelock is a special case of resource starvation; the general definition only states that a specific process is not progressing. http://en.wikipedia.org/wiki/Deadlock
... an implementation is nonblocking if it guarantees that some thread will complete an operation in a finite number of steps. Wait-free synchronization - M. Herlihy
====== 內存模型 ====== * Memory Coherence Model * 針對同一位址的一系列寫操作,確保所有 processor 看到的順序都一致。 * Memory Consistency Model * 針對所有位址的讀寫操作 * [[wp>Memory ordering]] * [[wp>Memory model (computing)]] * [[http://www.cl.cam.ac.uk/~pes20/weakmemory/|Relaxed-Memory Concurrency]] 在多緒環境下,主要考慮的是 memory consistency model。最簡單的是 [[wp>Sequential consistency|sequential consistency model]],它是一个最直观最易理解的多线程程序执行顺序的模型。 * [[http://www.parallellabs.com/2010/03/06/why-should-programmer-care-about-sequential-consistency-rather-than-cache-coherence/|为什么程序员需要关心顺序一致性(Sequential Consistency)而不是Cache一致性(Cache Coherence?)]]
The preceding discussion on atomic reading begs a discussion on the differences between atomicity and ordering. As discussed, a word-sized read will always occur atomically. It will never interleave with a write to the same word; the read will always return the word in a consistent stateperhaps before the write completed, perhaps after, but never during. For example, if an integer is initially 42 and then set to 365, a read on the integer will always return 42 or 365 and never some commingling of the two values. This is atomicity. It might be that your code wants something more than this, perhaps for the read to always occur before the pending write. This is not atomicity, but ordering. Atomicity ensures that instructions occur without interruption and that they complete either in their entirety or not at all. Ordering, on the other hand, ensures that the desired order of two or more instructionseven if they are to occur in separate threads of execution or even separate processorsis preserved. The atomic operations discussed in this section guarantee only atomicity. Ordering is enforced via barrier operations, which we will discuss later in this chapter. http://www.makelinux.net/books/lkd2/ch09lev1sec1
===== ARM ===== * DMB (Data Memory Barrier) * DSB (Data Synchronization Barrier) * [[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0013a/index.html|ARM Synchronization Primitives Development Article]] * [[https://silver.arm.com/download/download.tm?pv=1143190|Cortex-A Series Programmer's Guide]] - Chapter 9 Memory Ordering * [[http://stackoverflow.com/questions/3578587/how-is-the-arm-memory-model-different-from-ia64|How is the arm memory model different from ia64?]] ==== 文件 ==== * [[http://loda.hala01.com/2011/02/arm%e8%88%87cortex%e7%ad%86%e8%a8%98/|ARM與Cortex筆記]] * [[http://loda.hala01.com/2011/06/arm%E8%88%87cortex%E7%AD%86%E8%A8%98-arm-mpcore-multi-processor-core-%E6%9E%B6%E6%A7%8B%E8%A7%A3%E6%9E%90/|ARM與Cortex筆記-ARM MPCore (Multi-Processor Core) 多核心架構解析]] * [[http://mpsoc-forum.org/2003/slides/MPSoC_ARM_MP_Architecture.pdf|MPSoC – System Architecture ARM Multiprocessing]] * [[http://portal.acm.org/ft_gateway.cfm?id=1353528&type=pdf|Reasoning about the ARM weakly consistent memory model]] * [[http://blogs.arm.com/software-enablement/431-memory-access-ordering-an-introduction/|Memory access ordering - an introduction]] * [[http://blogs.arm.com/software-enablement/448-mem ory-access-ordering-part-2-barriers-and-the-linux-kernel/|Memory access ordering part 2 - barriers and the Linux kernel]] * [[http://blogs.arm.com/software-enablement/141-caches-and-self-modifying-code/|Caches and Self-Modifying Code]] * [[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.home/index.html|ARM Infocenter]] ====== 外部連結 ====== * [[http://www.multicoreinfo.com/|multicoreinfo.com]] * [[http://kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html|Is Parallel Programming Hard, And, If So, What Can You Do About It?]] * [[http://www.parallellabs.com/|并行实验室 | Parallel Labs]] * [[http://preshing.com/|Preshing on Programming]]