-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
log: add histogram metrics for gradients #424
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hasan-yaman Thank you for your another PR! This feature is really nice. Please let me share my thoughts here:
- Having two different methods for the same purpose is a little redundant.
write_histogram
is currently only used to monitor gradients.
Here is my proposal. We can remove write_histogram
and just keep watch_model
alone. watch_model
is called at every update
steps. For wandb, it calls self.run.watch
for the first time, but does nothing after that. For FileAdapter
and TensorboardAdapter
, it computes gradient histograms there every gradient_loggin_steps
. What do you think?
@takuseno Thanks for the comments! |
d3rlpy/algos/qlearning/base.py
Outdated
@@ -520,6 +525,8 @@ def fitter( | |||
# save hyperparameters | |||
save_config(self, logger) | |||
|
|||
logger.watch_model(0, 0, gradient_logging_steps, self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like it but it is required for wandb watch. Without this line wand doesn't track the first epoch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be fixed via #424 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change! I noticed that currently, name of parameters are conflicting without their parent module names. I left some suggestions to resolve this.
d3rlpy/logging/file_adapter.py
Outdated
) -> None: | ||
if logging_steps is not None and step % logging_steps == 0: | ||
for name, grad in algo.impl.modules.get_gradients(): | ||
path = os.path.join(self._logdir, f"{name}.csv") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
path = os.path.join(self._logdir, f"{name}_grad.csv")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d3rlpy/torch_utility.py
Outdated
@@ -388,6 +391,19 @@ def reset_optimizer_states(self) -> None: | |||
if isinstance(v, torch.optim.Optimizer): | |||
v.state = collections.defaultdict(dict) | |||
|
|||
def get_torch_modules(self) -> List[nn.Module]: | |||
torch_modules: List[nn.Module] = [] | |||
for v in asdict_without_copy(self).values(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's nice to return both key
and value
for get_gradients
:
torch_modules = {}
for k, v in asdict_without_copy(self).values():
if isinstance(v, nn.Module):
torch_modules[k] = v
return torch_modules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d3rlpy/torch_utility.py
Outdated
return torch_modules | ||
|
||
def get_gradients(self) -> Iterator[Tuple[str, Float32NDArray]]: | ||
for module in self.get_torch_modules(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concatenate module name and parameter names otherwise names conflict.
for module_name, module in self.get_torch_modules().items():
...
yield f"{module_name}.{name}", parameter.grad.cpu().detach().numpy()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.run.watch( | ||
tuple(algo.impl.modules.get_torch_modules().values()), | ||
log="gradients", | ||
log_freq=logging_steps, | ||
) | ||
self._is_model_watched = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of watch
, we can use
self.run.log({"name": wandb.Histogram(...) })
not sure which direction I should choose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I'm okay with the either way. The current implementation also looks good to me.
@hasan-yaman Thanks for the update! The implementation looks good to me. It seems that CI complains about some typing issues. Once they're fixed, let merge your PR 😄 |
@takuseno couldn't understand and fix the typing issues. am i missing something? |
@hasan-yaman Can you simply add |
@takuseno done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you for your contribution!
Actually, I'm thinking that it might be good to record gradients every epoch by default. Is there any concern if we do that? |
Inspired by wandb.watch.
Add support for exporting histogram metrics.
Export gradients to see vanishing / gradient gradients.