log: add histogram metrics for gradients #424

hasan-yaman · 2024-10-12T23:13:08Z

Inspired by wandb.watch.
Add support for exporting histogram metrics.
Export gradients to see vanishing / gradient gradients.

takuseno

@hasan-yaman Thank you for your another PR! This feature is really nice. Please let me share my thoughts here:

Having two different methods for the same purpose is a little redundant.
write_histogram is currently only used to monitor gradients.

Here is my proposal. We can remove write_histogram and just keep watch_model alone. watch_model is called at every update steps. For wandb, it calls self.run.watch for the first time, but does nothing after that. For FileAdapter and TensorboardAdapter, it computes gradient histograms there every gradient_loggin_steps. What do you think?

hasan-yaman · 2024-10-13T08:40:13Z

@takuseno Thanks for the comments!
This way code looks simpler.

hasan-yaman · 2024-10-13T08:44:10Z

d3rlpy/algos/qlearning/base.py

@@ -520,6 +525,8 @@ def fitter(
        # save hyperparameters
        save_config(self, logger)

+        logger.watch_model(0, 0, gradient_logging_steps, self)


I don't like it but it is required for wandb watch. Without this line wand doesn't track the first epoch.

Can be fixed via #424 (comment)

takuseno

Thanks for the change! I noticed that currently, name of parameters are conflicting without their parent module names. I left some suggestions to resolve this.

takuseno · 2024-10-13T08:42:13Z

d3rlpy/logging/file_adapter.py

+    ) -> None:
+        if logging_steps is not None and step % logging_steps == 0:
+            for name, grad in algo.impl.modules.get_gradients():
+                path = os.path.join(self._logdir, f"{name}.csv")


path = os.path.join(self._logdir, f"{name}_grad.csv")

takuseno · 2024-10-13T08:53:25Z

d3rlpy/torch_utility.py

@@ -388,6 +391,19 @@ def reset_optimizer_states(self) -> None:
            if isinstance(v, torch.optim.Optimizer):
                v.state = collections.defaultdict(dict)

+    def get_torch_modules(self) -> List[nn.Module]:
+        torch_modules: List[nn.Module] = []
+        for v in asdict_without_copy(self).values():


It's nice to return both key and value for get_gradients:

torch_modules = {} for k, v in asdict_without_copy(self).values(): if isinstance(v, nn.Module): torch_modules[k] = v return torch_modules

takuseno · 2024-10-13T08:54:42Z

d3rlpy/torch_utility.py

+        return torch_modules
+
+    def get_gradients(self) -> Iterator[Tuple[str, Float32NDArray]]:
+        for module in self.get_torch_modules():


Concatenate module name and parameter names otherwise names conflict.

for module_name, module in self.get_torch_modules().items(): ... yield f"{module_name}.{name}", parameter.grad.cpu().detach().numpy()

hasan-yaman · 2024-10-13T09:26:36Z

d3rlpy/logging/wandb_adapter.py

+            self.run.watch(
+                tuple(algo.impl.modules.get_torch_modules().values()),
+                log="gradients",
+                log_freq=logging_steps,
+            )
+            self._is_model_watched = True


Instead of watch, we can use

self.run.log({"name": wandb.Histogram(...) })

not sure which direction I should choose

For now, I'm okay with the either way. The current implementation also looks good to me.

takuseno · 2024-10-13T13:53:36Z

@hasan-yaman Thanks for the update! The implementation looks good to me. It seems that CI complains about some typing issues. Once they're fixed, let merge your PR 😄

hasan-yaman · 2024-10-13T17:17:24Z

@takuseno couldn't understand and fix the typing issues. am i missing something?

takuseno · 2024-10-15T09:31:10Z

@hasan-yaman Can you simply add type: ignore just to skip errors? I can fix them later once we merge this. In this PR, I'd like to suppress errors.

hasan-yaman · 2024-10-15T11:39:12Z

@takuseno done!

takuseno

LGTM. Thank you for your contribution!

takuseno · 2024-10-15T12:32:23Z

Actually, I'm thinking that it might be good to record gradients every epoch by default. Is there any concern if we do that?

hasan-yaman added 3 commits October 13, 2024 00:59

initial implementation for gradient histogram

0dc11ce

updates

a2e4431

more updates

502d526

takuseno reviewed Oct 13, 2024

View reviewed changes

try simplify the implementation

3cf717f

hasan-yaman commented Oct 13, 2024

View reviewed changes

takuseno requested changes Oct 13, 2024

View reviewed changes

hasan-yaman added 3 commits October 13, 2024 12:02

mypy :sad:

39246f6

file adapter: add suffix for grad

0fb02e7

fix name conflict

5b8bb3c

hasan-yaman commented Oct 13, 2024

View reviewed changes

update tests

44eece2

ignore errors

f3cf6e0

takuseno approved these changes Oct 15, 2024

View reviewed changes

takuseno merged commit 3d51ee7 into takuseno:master Oct 15, 2024
4 checks passed

hasan-yaman deleted the wandb-log-improvements branch October 15, 2024 12:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

log: add histogram metrics for gradients #424

log: add histogram metrics for gradients #424

hasan-yaman commented Oct 12, 2024

takuseno left a comment

hasan-yaman commented Oct 13, 2024

hasan-yaman Oct 13, 2024

hasan-yaman Oct 13, 2024

takuseno left a comment

takuseno Oct 13, 2024

hasan-yaman Oct 13, 2024

takuseno Oct 13, 2024

hasan-yaman Oct 13, 2024

takuseno Oct 13, 2024

hasan-yaman Oct 13, 2024

hasan-yaman Oct 13, 2024

takuseno Oct 13, 2024

takuseno commented Oct 13, 2024

hasan-yaman commented Oct 13, 2024

takuseno commented Oct 15, 2024

hasan-yaman commented Oct 15, 2024

takuseno left a comment

takuseno commented Oct 15, 2024

log: add histogram metrics for gradients #424

log: add histogram metrics for gradients #424

Conversation

hasan-yaman commented Oct 12, 2024

takuseno left a comment

Choose a reason for hiding this comment

hasan-yaman commented Oct 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

takuseno left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

takuseno commented Oct 13, 2024

hasan-yaman commented Oct 13, 2024

takuseno commented Oct 15, 2024

hasan-yaman commented Oct 15, 2024

takuseno left a comment

Choose a reason for hiding this comment

takuseno commented Oct 15, 2024