-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build DAG from functions directly #1268
Comments
Yes there could be a big refactor for Hamilton 2.0. Otherwise a small reason trying to make things look like a module was done, was so that we wouldn't have to rewire the internals of Hamilton which assumed a module would be passed in. The larger reason why it wasn't enabled from the beginning and why Question:
|
Otherwise some API things to think through / check:
|
|
Current main code path (roughly):
In other words, the module abstraction is currently irrelevant for building the DAG. It only matters for downstream usage. Propositions:
Answers
Current API options around "get functions from the current namespace and build a DAG" look hacky and have poor ergonomics. Solving this problem would be the same as smoother providing a smoother "if name == main, run this as a DAG"
The current propositions don't need to involve a user-facing API change. They would make developer life easy and would provide a first-class way of passing functions to create a function graph / driver Simple user-facing API options:
Could be from anywhere. Doesn't need to come from a file. Our options to maintain compatibility:
No change to current behavior because the module metadata plays no role in graph building and core Hamilton features.
Don't know the details of this. If you have an instantiated function, you must have the pickleable bytes of the instantiated function, and you probably have available source code (from a
Don't know the assumptions of the Hamilton UI. If there's is a blocking assumption, it's better to change it? The main potential limitation is that we have some UI components that expect a non-empty metadata field.
Lambdas and static method are not currently supported and are not within the scope of what I intended here. Though, building graphs from functions would introduce a simple way to create nodes and graph from lambdas and static methods |
I asked a question in Slack about this feature or something similar, and was asked to post the context on the issue directly. Granted, I am brand new to Hamilton and am merely scoping it out for possible adoption, so my perspective and context might not be as relevant. I am interested in using Hamilton to help maintain data science workflows and envision creating a shared node/subgraph library and an analytic store with individual modules corresponding to each data science workflow. Thus I imagine I would create a workflow I also think for rapid development, importing a list of functions to be built can be useful when you want to cannibalize another workflow someone has written for a few useful functions. In the long run, this probably calls for a refactor and integration of shared components into a share node library, but perhaps a random data scientist might not have maintainer permissions for that library. If you import someone else's module wholesale, especially a complicated one, you run the risk of accidental overrides and collisions, or possibly whole extra subgraphs from the other workflow that get executed unnecessarily. You would have to carefully check you are overriding the correct functions and not invoking extra subgraphs, whereas if you could just import the few functions or subgraphs you wanted, you would just make sure your own DAG is compatible with those, allowing for safe and rapid ad hoc development. Again, I have only been playing with toy problems and these might be non issues with more experience, but I am interested in this feature. |
Summary
Add first-class way to build a DAG from function objects (in contrast to DIY/ hacky). The user API could be either:
Builder().with_functions()
my_module = module_from_functions(fns*)
then passed to.with_modules()
Goals:
Current
At the core of Hamilton, users:
Problem
ad_hoc_utils
means exactly the oppositeBenefits
Hamilton 2.0 / Broader perspective
There's no well-defined structure or purpose to Hamilton top-level modules (e.g.,
nodes
,graph_types
,graph_utils
,graph
,ad_hoc_utils
,base
,hamilton.common
,models
). I propose a structure that matches the Hamilton lifecycle:hamilton.parser
: everything that deals with source code: how functions are written, if type annotations are present (not type matching), collecting functions from modules, converting a notebook cell string to a module, remove comments and docstring before hashing source codehamilton.compiler
: converting code to DAG: structuring the DAG from functions, applying function modifiers, validating types, etc.ad_hoc_utils
The text was updated successfully, but these errors were encountered: