Lock-free bag #29

bartoszmodelski · 2022-11-02T14:03:16Z

Overview

It would be useful to develop another data structure: lock-free bag. Below describes what it is, notes potential uses and links a paper with design.

Bag

Bag (or multiset) is a data structure, which stores a collection of values. Elements are inserted or removed one at a time. In contrast with queue or stack, bag has no ordering. That is remove can return the youngest, oldest or any other item currently in the bag. In principle, lack of ordering eliminates contention points and should lead to better throughput and scalability.

Uses

Better throughput for Domainslib than with stack or queue (for workloads without significant locality between tasks).
Better throughput for Reagents Replace MSQueue with concurrent lock-free bag. reagents#4

Design

A lock-free algorithm for concurrent bags proposes a compelling approach. See figure 4 for performance comparison with some classical structures, e.g. Michael Scott queue.

Johan511 · 2023-01-23T09:05:00Z

Hi, I am interested in working on this issue and have been trying to implement the threadBlocks as described in the paper. One issue I have faced was with the Backoff module, any advice on how I can import it into my Bag.ml file.

Also I would like some advice on how to allot a threadBlock to each thread.

polytypic · 2023-01-23T10:17:52Z

I totally agree that a scalable lock-free bag would be a useful data structure to have.

Also I would like some advice on how to allot a threadBlock to each thread.

You probably want to use DLS or Domain Local Storage.

Better throughput for Domainslib than with stack or queue (for workloads without significant locality between tasks).

I'm curious to see the result. For performing tasks, however, I realised many years ago that stack like or LIFO ordering tends to be generally preferable. The reason isn't contention. The reason is that LIFO ordering results in a Depth-First Search like behaviour, while FIFO results in a Breadth-First Search like behaviour. The crucial difference is in memory and general resource usage. Deviations from the DFS behaviour tend to increase resource usage.

In a kind of build tool that performed large numbers of largely independent tasks in parallel I initially used a queue. However, that resulted in the program running out of memory. Memory usage grew proportional to the number of tasks that could proceed if given CPU time. I changed the program to use a stack. Memory usage then stayed roughly (constant or) proportional to the number of CPU cores.

lyrm · 2024-12-06T09:35:15Z

We now have a bag implemented in Saturn, which uses a hash table internally.

However, I’m keeping this issue open, as it would be interesting to explore how a different algorithm or implementation might compare in terms of performance.

Since this is less of a priority compared to other new data structure issues, I am changing the tag from new data structure to enhancement.

bartoszmodelski added the help wanted label Nov 2, 2022

bartoszmodelski added the new data structure Proposal for new data and synchronization structures label Dec 12, 2022

lyrm added enhancement and removed new data structure Proposal for new data and synchronization structures labels Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lock-free bag #29

Lock-free bag #29

bartoszmodelski commented Nov 2, 2022 •

edited

Loading

Johan511 commented Jan 23, 2023

polytypic commented Jan 23, 2023

lyrm commented Dec 6, 2024

Lock-free bag #29

Lock-free bag #29

Comments

bartoszmodelski commented Nov 2, 2022 • edited Loading

Overview

Bag

Uses

Design

Johan511 commented Jan 23, 2023

polytypic commented Jan 23, 2023

lyrm commented Dec 6, 2024

bartoszmodelski commented Nov 2, 2022 •

edited

Loading