Skip to content
This repository has been archived by the owner on Mar 12, 2024. It is now read-only.

Reduce the space complexity of the HungarianMatcher module. #606

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

aioaneid
Copy link

The memory reduction factor of the cost matrix is sum(#target objects) / max(#target objects).

That is achieved by no longer computing and storing matching costs between predictions and targets at different positions inside the batch. More exactly the original matrix of shape [batch_size * queries, sum(#target objects)] is shrinked to a tensor of shape [batch_size, queries, max(#target objects)].

Besides allowing much larger batch sizes, tested on the table structure recognition task using the Table Transformer (TATR) (125 queries, 7 classes) with Pubmed data, this change also results a) on CUDA at all batch sizes and on CPU with small batches in a small but meaningful speedup, b) on CPU with larger batch sizes in much higher speedups.

The processing time reduction computed as (1 - new_time / old_time) is shown below in various configurations:

Batch 1 2 3 4 5 6 7 8 16 32 64
CUDA 8.2% 1.6% 1.6% 0.9% 0.8% 0.9% 0.9%
CPU 1.6% 9.3% 7.7% 11.2% 13.9% 15.5% 23.1% 47.1% 70.6% 88.3% 95.0%

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 18, 2023
@aioaneid aioaneid changed the title Reduce HungarianMatcher's space complexity. Reduce space complexity of HungarianMatcher. Sep 18, 2023
The memory reduction factor of the cost matrix is sum(#target objects) / max(#target objects).

That is achieved by no longer computing and storing matching costs between predictions and targets at different positions inside the batch. More exactly the original matrix of shape [batch_size * queries, sum(#target objects)] is shrinked to a tensor of shape [batch_size, queries, max(#target objects)].

Besides allowing much larger batch sizes, tested on the table structure recognition task using the Table Transformer (TATR) (125 queries, 7 classes) with pubmed data, this change also results a) on CUDA at all batch sizes and on CPU with small batchs in a small but meaningful speedup, b) on CPU with larger batch sizes in much higher speedups.

The processing time decrease computed as (1 - new_time / old_time) is shown below in various configuration:

Batch |   Device
 size | cuda    cpu
------------------
1       8.2%   1.6%
2       1.6%   9.3%
3       1.6%   7.7%
4       0.9%  11.2%
5       0.8%  13.9%
6       0.9%  15.5%
7       0.9%  23.1%
8             47.1%
16            70.6%
32            88.3%
64            95.0%
@aioaneid aioaneid changed the title Reduce space complexity of HungarianMatcher. Reduce the space complexity of HungarianMatcher. Sep 18, 2023
@aioaneid aioaneid changed the title Reduce the space complexity of HungarianMatcher. Reduce the space complexity of the HungarianMatcher module. Sep 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants