-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ballista Python Update #1091
Comments
@andygrove , @timsaucer , @Michael-J-Ward your opinion would be greatly appreciated |
I’ve skimmed through the PR you linked, but I’ll need to take some time to review in more detail. I can think of one holdup regarding the python interface but hopefully it won’t be too difficult to overcome. |
Thanks @timsaucer I did a quick poc on top of #1088 to demonstrate functionality at dc21f9e |
@milenkovicm interested in supporting on this once Tim and team review your PR for #1069 |
referencing @timsaucer comment from discord https://discord.com/channels/885562378132000778/1297108588183027753/1298220209609642034 |
Sorry I haven’t been able lately to give this more attention, but I hope next week my time clears up some. |
No worries @timsaucer, I just want to note important point you brought. Thanks a lot |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
With changes done in #1088 and introduction with
SessionContextExt
we could make changes inpyballista
and support datafusion-python context directly instead ofBallistaContext
. This would unify datafusion and ballistapython interface enabling users to change from single node deployment to cluster deployment with single line change.
Describe the solution you'd like
I don't think we need to re-invent the wheel here, we just need to copy what https://github.com/apache/datafusion-ray is doing and do same for ballista. This PR should provide support for methods provided by
SessionContextExt
.Something similar to:
Propose a proper, python, way to initialize
datafuson::PySessionContext
I'm not python expert thus can't really propose ergonomic python interface, so not sure should we
use objects like
StandaloneBallista
or static methods likeBallista.standalone()
although,later would be trivial to make it may not be the most python ergonomics
It would be great if we align with datafusion-ray approach.
Make
standalone
optional dependencycan we make Ballista
standalone
optional dependency?should install remote mode only
pip install pyballista['standalone']
should install remote and standalone, providing easy way to test ballista applications.
Consider renaming python package
As
pyballista
is not published, can we consider renaming package to something like:datafusion-distributed
ordatafusion-ballista
to align with other packages.To me
datafusion-distributed
, makes sense and can consider renamingballista
(client) crate to same name, keepingballista-
prefix for executor and scheduler.Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Do we need to keep current pyballista implementation or we can remove it with this PR?
Relates to #1069, #1088
The text was updated successfully, but these errors were encountered: