Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analytics query streaming #535

Open
petrkozorezov opened this issue Oct 11, 2024 · 4 comments
Open

Analytics query streaming #535

petrkozorezov opened this issue Oct 11, 2024 · 4 comments
Assignees

Comments

@petrkozorezov
Copy link
Contributor

I have an analytics dataset with >100m rows and want to make a query over it and dump to parquet or csv for further analysis. Is it possible to do without intermediate storing in memory?

@Westwooo
Copy link
Contributor

Westwooo commented Oct 14, 2024

Hello @petrkozorezov, the easiest way to do this would be:

analytics "your analytics query here" | save results.csv

I believe that save will stream the data without collection. Let me know if this addresses your issue.

@petrkozorezov
Copy link
Contributor Author

It doesn't look like a streaming. It download first, then eats a lot of memory, and then writes everything to disk.
In my case it works with 5 columns and 60m rows (using around 60Gb ram), but doesn't with 80m (because of OOM killer) (the same query but with different 'limit').

@Westwooo
Copy link
Contributor

You're correct, I think it could be possible to add support for streaming to the analytics command, I'll leave this issue up as a request for that feature.

@petrkozorezov
Copy link
Contributor Author

related #162

@Westwooo Westwooo self-assigned this Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants