You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hyperthreading only increases performance up to 20%, and on our machines excessive parallelism can cause OOM in addition to the traditional "noisy neighborhood" issue. Docker allows you to set CPU affinity (--cpuset-cpus), which basically pins which cores a container can use. Most tool like ninja will honour this setting.
Below, I'll describe how we can implement this:
Fetch a node's CPU count with the docker info API
Divide the CPU count with a preconfigured split value
Set --cpuset-cpus round-robin or randomly: see below.
Dividing CPU resources correctly
Linux seems to arrange CPU as [HT][NUMA][CORES]. For example, if we have 2 CPUs with 2 cores each, plus HT:
0: CPU 0, Core 0
1: CPU 0, Core 1
2: CPU 1, Core 0
3: CPU 1, Core 1
4: CPU 0, Core 0
5: CPU 0, Core 1
6: CPU 1, Core 0
7: CPU 1, Core 1
Which means, we can simply divide the range sequentially, such as 0,1, 2,3, 4,5, 6,7 to both ensure memory locality and avoid two virtual cores to be pinned to the same physical core.
To spread out load on different virtual cores, we can use either a round-robin approach or a stateless random selection approach.
The text was updated successfully, but these errors were encountered:
Hyperthreading only increases performance up to 20%, and on our machines excessive parallelism can cause OOM in addition to the traditional "noisy neighborhood" issue. Docker allows you to set CPU affinity (
--cpuset-cpus
), which basically pins which cores a container can use. Most tool like ninja will honour this setting.Below, I'll describe how we can implement this:
docker info
API--cpuset-cpus
round-robin or randomly: see below.Dividing CPU resources correctly
Linux seems to arrange CPU as
[HT][NUMA][CORES]
. For example, if we have 2 CPUs with 2 cores each, plus HT:Which means, we can simply divide the range sequentially, such as
0,1
,2,3
,4,5
,6,7
to both ensure memory locality and avoid two virtual cores to be pinned to the same physical core.To spread out load on different virtual cores, we can use either a round-robin approach or a stateless random selection approach.
The text was updated successfully, but these errors were encountered: