Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock-free bag #29

Open
bartoszmodelski opened this issue Nov 2, 2022 · 3 comments
Open

Lock-free bag #29

bartoszmodelski opened this issue Nov 2, 2022 · 3 comments

Comments

@bartoszmodelski
Copy link
Contributor

bartoszmodelski commented Nov 2, 2022

Overview

It would be useful to develop another data structure: lock-free bag. Below describes what it is, notes potential uses and links a paper with design.

Bag

Bag (or multiset) is a data structure, which stores a collection of values. Elements are inserted or removed one at a time. In contrast with queue or stack, bag has no ordering. That is remove can return the youngest, oldest or any other item currently in the bag. In principle, lack of ordering eliminates contention points and should lead to better throughput and scalability.

Uses

Design

A lock-free algorithm for concurrent bags proposes a compelling approach. See figure 4 for performance comparison with some classical structures, e.g. Michael Scott queue.

@bartoszmodelski bartoszmodelski added the new data structure Proposal for new data and synchronization structures label Dec 12, 2022
@Johan511
Copy link

Hi, I am interested in working on this issue and have been trying to implement the threadBlocks as described in the paper. One issue I have faced was with the Backoff module, any advice on how I can import it into my Bag.ml file.

Also I would like some advice on how to allot a threadBlock to each thread.

@polytypic
Copy link
Contributor

I totally agree that a scalable lock-free bag would be a useful data structure to have.

Also I would like some advice on how to allot a threadBlock to each thread.

You probably want to use DLS or Domain Local Storage.

Better throughput for Domainslib than with stack or queue (for workloads without significant locality between tasks).

I'm curious to see the result. For performing tasks, however, I realised many years ago that stack like or LIFO ordering tends to be generally preferable. The reason isn't contention. The reason is that LIFO ordering results in a Depth-First Search like behaviour, while FIFO results in a Breadth-First Search like behaviour. The crucial difference is in memory and general resource usage. Deviations from the DFS behaviour tend to increase resource usage.

In a kind of build tool that performed large numbers of largely independent tasks in parallel I initially used a queue. However, that resulted in the program running out of memory. Memory usage grew proportional to the number of tasks that could proceed if given CPU time. I changed the program to use a stack. Memory usage then stayed roughly (constant or) proportional to the number of CPU cores.

@lyrm lyrm added enhancement and removed new data structure Proposal for new data and synchronization structures labels Dec 6, 2024
@lyrm
Copy link
Collaborator

lyrm commented Dec 6, 2024

We now have a bag implemented in Saturn, which uses a hash table internally.

However, I’m keeping this issue open, as it would be interesting to explore how a different algorithm or implementation might compare in terms of performance.

Since this is less of a priority compared to other new data structure issues, I am changing the tag from new data structure to enhancement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants