You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a long-running process defined as an asynchronous pipeline of multiple steps. You may think of it as an ETL pipeline. These steps can consist of activity such as modifying pandas dataframes for example).
Not all steps have memory issues but I would like to use memray to understand how memory is used and passed around in my pipeline over time. There is a memory explosion that occurs in one of the final steps of the pipeline.
Since the current flamegraph report only displays a snapshot of memory use at peak memory time, it is hard for me to investigate what/where in prior steps of the pipeline might be a contributing factor to the explosion. While analyzing the point of peak memory explosion is very insightful, it does not capture enough information about other parts of my pipeline that could be optimized compared to areas that cannot (known reasons for the explosion like a join call between pandas dataframes is always memory expensive). Instead, focusing on reducing memory in prior steps would reduce the impact of the memory explosion in later steps and still lead to an overall reduction in memory.
Describe the solution you'd like
Is there a way to plant some sort of marker in my code (i.e. as a function decorator or API call) that signals to memray to take a snapshot of the memory usage?
Ideally I'd like to look at a plot of heap memory usage over time and see markers where my code called into memray to record a place in time and code. This functionality would enable users to make custom calls to memray in their code (my pipeline code for example) to generate timestamped snapshots of memory and compare them over time (not just at peak memory usage).
Alternatives you considered
No response
The text was updated successfully, but these errors were encountered:
Is there an existing proposal for this?
Is your feature request related to a problem?
I have a long-running process defined as an asynchronous pipeline of multiple steps. You may think of it as an ETL pipeline. These steps can consist of activity such as modifying pandas dataframes for example).
Not all steps have memory issues but I would like to use
memray
to understand how memory is used and passed around in my pipeline over time. There is a memory explosion that occurs in one of the final steps of the pipeline.Since the current flamegraph report only displays a snapshot of memory use at peak memory time, it is hard for me to investigate what/where in prior steps of the pipeline might be a contributing factor to the explosion. While analyzing the point of peak memory explosion is very insightful, it does not capture enough information about other parts of my pipeline that could be optimized compared to areas that cannot (known reasons for the explosion like a join call between pandas dataframes is always memory expensive). Instead, focusing on reducing memory in prior steps would reduce the impact of the memory explosion in later steps and still lead to an overall reduction in memory.
Describe the solution you'd like
Is there a way to plant some sort of marker in my code (i.e. as a function decorator or API call) that signals to memray to take a snapshot of the memory usage?
Ideally I'd like to look at a plot of heap memory usage over time and see markers where my code called into memray to record a place in time and code. This functionality would enable users to make custom calls to
memray
in their code (my pipeline code for example) to generate timestamped snapshots of memory and compare them over time (not just at peak memory usage).Alternatives you considered
No response
The text was updated successfully, but these errors were encountered: