You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Memory Issue When Running Multiple Batches of Prompts
Description
Similar to the known issue regarding memory freeing (PR #3069), I am encountering out-of-memory (OOM) problems when processing multiple batches of prompts.
My goal is to process a large number of prompts (on the order of thousands) while saving outputs every 500 prompts. To achieve this, I split the prompts into batches. However, due to memory not being freed properly, running subsequent batches leads to OOM errors.
Additionally, while the upcoming pipe.close() feature appears to free the model from VRAM, it would require reloading the model for each batch, which may not be optimal in this scenario.
Reproduction
models='the/path/of/internlm2/model'batch_size=500batches= [prompts[i:i+batch_size] foriinrange(0, len(prompts), batch_size)] # Split into batches of 500pipe=pipeline(model)
forbatchinbatches:
response=pipe(batch)
# Output the result of the batch
Expected Behavior
Memory should be properly freed after processing each batch to allow continuous execution.
Ideally, there should be a way to clear intermediate memory usage without fully unloading the model (as pipe.close() might do).
Related resources
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
Memory allocated during inference will be reused by the engine, no additional deallocation is needed. However if a later batch requires more memory the engine will try to allocate more memory, which may trigger the OOM exception.
There are 2 cases
Some later batch make the engine try to allcoate more memory than the system currently available. In this case try to decrease memory related parameters such as cache_max_entry_count and max_prefill_token_num
After generation of the batch is complete, you called other pytorch functions that allocates gpu memory, the allocated memory will be cached by pytorch and is not reusable by the engine. In this case a torch.cuda.empty_cache() is needed to empty the cache before the next batch starts
Motivation
Memory Issue When Running Multiple Batches of Prompts
Description
Similar to the known issue regarding memory freeing (PR #3069), I am encountering out-of-memory (OOM) problems when processing multiple batches of prompts.
My goal is to process a large number of prompts (on the order of thousands) while saving outputs every 500 prompts. To achieve this, I split the prompts into batches. However, due to memory not being freed properly, running subsequent batches leads to OOM errors.
Additionally, while the upcoming
pipe.close()
feature appears to free the model from VRAM, it would require reloading the model for each batch, which may not be optimal in this scenario.Reproduction
Expected Behavior
pipe.close()
might do).Related resources
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: