- X86-32/64
In the end of 1970's, Leslie Lamport proved that, under Sequential Consistency memory model, a Single-Producer/Single-Consumer circular buffer can be implemented without using explicit synchronization mechanisms between the producer and the consumer. (From Massimo Torquati's <Single-Producer/Single-Consumer Queues on Shared Cache Multi-Core Systems>)
I have found two implementations of Lamport's SPSC Queue which are based on circular buffer from the Internet. One is KjellKod.cc's, another is rigtorp's. And as below there are two other implementations, one is based on linked-list, another has made some improvements on Lamport's.
-
Unbounded SPSC Queue
-
Bounded SPSC Queue
Be the same as rigtorp's.
The following numbers are for a 2 socket machine with 2 x Intel(R) Xeon(R) CPU E5-2658 v3 @ 2.20GHz.
Unbounded SPSC Queue
NUMA Node / Core / Hyper-Thread | Throughput (ops/ms) | Latency RTT (ns) |
---|---|---|
#0,#0,#0 & #0,#0,#1 | 242332 | 58 |
#0,#0,#0 & #0,#1,#0 | 137727 | 387 |
#0,#0,#0 & #1,#0,#0 | 22787 | 836 |
SPSC Queue
NUMA Node / Core / Hyper-Thread | Throughput (ops/ms) | Latency RTT (ns) |
---|---|---|
#0,#0,#0 & #0,#0,#1 | 192445 | 63 |
#0,#0,#0 & #0,#1,#0 | 190213 | 279 |
#0,#0,#0 & #1,#0,#0 | 94403 | 752 |
NUMA Node / Core / Hyper-Thread | Throughput (ops/ms) | Latency RTT (ns) |
---|---|---|
#0,#0,#0 & #0,#0,#1 | 41223 | 112 |
#0,#0,#0 & #0,#1,#0 | 20061 | 315 |
#0,#0,#0 & #1,#0,#0 | 15842 | 722 |