Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for custom timestamped snapshots of memory #678

Open
1 task done
jgbradley1 opened this issue Sep 9, 2024 · 1 comment
Open
1 task done

Add support for custom timestamped snapshots of memory #678

jgbradley1 opened this issue Sep 9, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@jgbradley1
Copy link

jgbradley1 commented Sep 9, 2024

Is there an existing proposal for this?

  • I have searched the existing proposals

Is your feature request related to a problem?

I have a long-running process defined as an asynchronous pipeline of multiple steps. You may think of it as an ETL pipeline. These steps can consist of activity such as modifying pandas dataframes for example).

Not all steps have memory issues but I would like to use memray to understand how memory is used and passed around in my pipeline over time. There is a memory explosion that occurs in one of the final steps of the pipeline.

Since the current flamegraph report only displays a snapshot of memory use at peak memory time, it is hard for me to investigate what/where in prior steps of the pipeline might be a contributing factor to the explosion. While analyzing the point of peak memory explosion is very insightful, it does not capture enough information about other parts of my pipeline that could be optimized compared to areas that cannot (known reasons for the explosion like a join call between pandas dataframes is always memory expensive). Instead, focusing on reducing memory in prior steps would reduce the impact of the memory explosion in later steps and still lead to an overall reduction in memory.

Describe the solution you'd like

Is there a way to plant some sort of marker in my code (i.e. as a function decorator or API call) that signals to memray to take a snapshot of the memory usage?

Ideally I'd like to look at a plot of heap memory usage over time and see markers where my code called into memray to record a place in time and code. This functionality would enable users to make custom calls to memray in their code (my pipeline code for example) to generate timestamped snapshots of memory and compare them over time (not just at peak memory usage).

Alternatives you considered

No response

@jgbradley1 jgbradley1 added the enhancement New feature or request label Sep 9, 2024
@jgbradley1 jgbradley1 changed the title Add annotations Add support for custom timestamped snapshots of memory Sep 9, 2024
@godlygeek
Copy link
Contributor

Have you seen https://bloomberg.github.io/memray/flamegraph.html#temporal-flame-graphs ? The --temporal mode lets you compare the memory usage between two moments in your program's execution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants