Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: a lot #167

Merged
merged 16 commits into from
Jan 15, 2025
Merged

refactor: a lot #167

merged 16 commits into from
Jan 15, 2025

Conversation

eddiebergman
Copy link
Contributor

@eddiebergman eddiebergman commented Jan 8, 2025

Overview, there was a lot of things which bled in to each-other. TLDR; the code-base is vastly simpler at this point which given the amount of actual work done on the code-base means it should be less to worry about maintaining.

Yaml

Some previous PR's aimed to add a lot of yaml support into neps through `neps.run(..., run_args="path/to/config.yaml"). These yamls could inside themselves, even then provide paths to other yamls and so on.

After talking to some people who used the feature it ended up being more confusing than useful, where just loading in yamls themselves and passing arguments into NePS would have been the preferred option, i.e. neps does not dictate experiment setup.

To actually make this easier, the preferred flow is now:

with open("config.yaml") as f:
    config = yaml.load(f)
    
neps.run(**config)

The only change to actually make this easily possible is that the neps.run can take in things like a search space definition in it's serialized dict version as an argument to neps.run(). For example, you could also specify neps.run(evaluate_pipeline="/path/to/file.py:my_eval_func"), or just import the function and pass it in yourself.

This prevents the issue where we try to see what was set in run_args=, but also merge that with defaults, but also merge that with nested yaml things.

Estimated this removed around ~1000 lines of code and about 3000 lines of test code/test yamls.

Optimizer Heirarchy

There is none anymore. At the end of the day, this is all the runtime needs from an optimizer. It can be a class, a dataclass, a function, a method, neps shouldn't care.

def __call__(
    self,
    trials: Mapping[str, Trial],     # Contains all the information we know about runs
    budget_info: BudgetInfo | None,  # If you need to be cost aware
    n: int | None = None,            # Don't support batching? raise an error
) -> SampledConfig | list[SampledConfig]: ...

What does this give us?

It's easier to actually develop an optimizer now and pass it in to neps to be used. The core logic of what's going on with each optimizer is also a lot more linear and legible. For example, you can see the core logic in the __call__ of the BracketOptimizer in 90 lines of code which supports all of the bracket/rung based optimizer we had before.

If you like heirarchies and such, you're still free to do so, neps runtime and logic loop, does not care.

The benefit of moving away from heirarchies is it actually forces us to have re-usable components, rather than re-using methods. We've all been there where the attributes in inheritance don't really make sense but you just want to change out one method.

Here's the simplest template we could give to implement your own optimizer:

@dataclass
class MyOpt:
    pipeline_space: SearchSpace
    # ... other attributes
    
    def __call__(
        self,
        trials: Mapping[str, Trial],
        budget_info: BudgetInfo | None,
        n: int | None = None,
    ) -> SampledConfig | list[SampledConfig]:
        # A high level overview of what the optimizer is actually doing logically,
        # calling out to functions where needed.
        # Calls out to other functions where needed
        

Yup, that's it. You can now do neps.run(..., optimizer=MyOpt)

Re-usable components

TODO

Graphs

TODO

Replaced with a partial of `SuccessiveHalvingBase`
The only thing inheriting from it was `PriorBand`, which ended up
replacing all the defaults of it's `__init__()`. The only other thing
the `HyperbandCustomDefault.__init__()` did was change the sampling args
of the SH brackets, which then `PriorBand` would overwrite it its own
`__init__()`.
The only user of the was `PriorBand`, in the only
thing it did was explicitly pass `use_priors=`True`.
Everything else it set was overwritten by `PriorBand`
passing down its args.
@eddiebergman eddiebergman mentioned this pull request Jan 15, 2025
@eddiebergman eddiebergman changed the title refactor: optimizers refactor: a lot Jan 15, 2025
@eddiebergman eddiebergman merged commit fd6bf3b into master Jan 15, 2025
13 checks passed
@eddiebergman eddiebergman deleted the cleanup-mf-heirarchies branch January 15, 2025 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

1 participant