-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rfc] enable direct configuration in quantize_ #1585
Conversation
14a21bb
to
d7cdbb4
Compare
Is there someone on the quanty/core side of things that can chime in on this API cc @andrew Or Tbh this feels almost too similar to the original since this isn't a Pure dataclass and has the _transform func. I imagine people want: quantize_(model, THE_CONFIG_TYPE(weight_scheme="f38", a_scheme"a12", ...)" FWIW : Lines 68 to 147 in 1240b19
I think this code is a good example of the downsides of the current approach. This has high cognitive load IMO and is hard to tell what exactly is being chosen when. I think a good sign that the new API is better if it also could clean up this type of code |
So, I'm actually highly in favor of moving the I personally care a lot about "don't wrap config in callable" and consider that blocking from moving training use cases here. I'm mildly annoyed by keeping the transform here, but not the hill I'd die on personally. |
Yeah, the API they want is something like this:
which I think is simple and pretty reasonable. Given a model and a config or recipe, quantize the model according to the settings configured in it. I think this kind of API is also pretty common. E.g.
As far as I know passing in a function as the main API isn't common at all. I would also argue it's better practice in general to pass in clearly defined types instead of arbitrary functions that let the user do anything they want. They can easily write the latter themselves if they want, just search for all |
I think another way of address this is to just register a handler for all configs we want to handle in
This will keep |
summary of offline discussion:
|
Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags:
d7cdbb4
to
997f715
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New proposal looks great to me!
weight_config: Optional[FakeQuantizeConfig] = None, | ||
) -> Callable: | ||
@dataclass | ||
class IntXQuantizationAwareTrainingWorkflowConfig(AOBaseWorkflowConfig): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe drop Workflow from the name? Seems a bit long
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think keeping the base class as AOBaseWorkflowConfig
and removing Workflow
from child classes could work here
apply_tensor_subclass: Callable[[torch.nn.Module], torch.nn.Module], | ||
# apply_tensor_subclass: Callable[[torch.nn.Module], torch.nn.Module], | ||
apply_tensor_subclass: Union[ | ||
Callable[[torch.nn.Module], torch.nn.Module], AOBaseWorkflowConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are two BC breaking steps:
- Change the arg name to "config"
- Change the arg type to
AOBaseWorkflowConfig
only
Based on my quick search I feel we can just do (1) now. I haven't found any use cases yet that call this with a keyword argument, so this seems relatively safe. For (2) maybe we should deprecate with warning first and remove after a couple releases? I feel this use case (users passing in arbitrary functions) is harder to grep for.
Curious what others think as well @drisspg @HDCharles
zero_point_domain=None, | ||
): | ||
@dataclass | ||
class Int4WeightOnlyWorkflowConfig(AOBaseWorkflowConfig): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. I feel eventually we do want an intermediate AffineQuantizationConfig
that lets users set the different dtypes, symmetric, dynamic etc, but I think we can add that later
def _int4_weight_only_transform( | ||
module: torch.nn.Module, config: Int4WeightOnlyWorkflowConfig | ||
) -> torch.nn.Module: | ||
# TODO(future PR): perhaps move this logic to a different file, to keep the API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree with this. We should move all configs and associated transforms to a separate file
moving development to #1595 to move to stacked PRs |
summary
This PR is a POC for passing per-workflow arguments to
quantize_
directly, without wrapping them in aCallable
.High level motivation: passing direct configuraton is intuintive and widely used in similar contexts across various projects. Passing configuration wrapped in a callable is IMO not intuitive, hard to understand and debug, and we have evidence that it pushes a significant portion of users from building on top of torchao.
user facing API proposed changes
signature of quantize_
usage example
An example for
int4_weight_only
developer facing proposed changes
See the PR details for examples, but they can be summarized as:
current status
At this point this is a POC, and I've shown how it can work on three user facing workflows:
int4_weight_only
intx_quantization_aware_training
andfrom_intx_quantization_aware_training
next steps
discuss more broadly
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags: