Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid value "timing(s)/gen" for parameter 'metrics[39].name' supplied: Names may only contain alphanumerics, underscores (_), dashes (-), periods (.), spaces ( ) and slashes (/). #119

Closed
fzyzcjy opened this issue Jan 20, 2025 · 2 comments · Fixed by #120

Comments

@fzyzcjy
Copy link
Contributor

fzyzcjy commented Jan 20, 2025

When upgraded to latest master verl, it seems #111 breaks the mlflow logging. It errors as follows:

[14:23:23.711]: [14:23:23.711]:   File "/host_home/research/code/third_party/verl/verl/trainer/main_ppo.py", line 185, in main_task
[14:23:23.711]: [14:23:23.711]:     trainer.fit()
[14:23:23.711]: [14:23:23.711]:   File "/host_home/research/code/third_party/verl/verl/trainer/ppo/ray_trainer.py", line 633, in fit
[14:23:23.711]: [14:23:23.711]:     logger.log(data=metrics, step=self.global_steps)
[14:23:23.711]: [14:23:23.711]:   File "/host_home/research/code/third_party/verl/verl/utils/tracking.py", line 58, in log
[14:23:23.711]: [14:23:23.711]:     logger_instance.log(data=data, step=step)
[14:23:23.711]: [14:23:23.711]:   File "/host_home/research/code/third_party/verl/verl/utils/tracking.py", line 65, in log
[14:23:23.711]: [14:23:23.711]:     mlflow.log_metrics(metrics=data, step=step)
[14:23:23.711]: [14:23:23.711]:   File "/root/miniconda3/lib/python3.10/site-packages/mlflow/tracking/fluent.py", line 984, in log_metrics
[14:23:23.711]: [14:23:23.711]:     return MlflowClient().log_batch(
[14:23:23.711]: [14:23:23.711]:   File "/root/miniconda3/lib/python3.10/site-packages/mlflow/tracking/client.py", line 1868, in log_batch
[14:23:23.711]: [14:23:23.711]:     return self._tracking_client.log_batch(
[14:23:23.711]: [14:23:23.711]:   File "/root/miniconda3/lib/python3.10/site-packages/mlflow/tracking/_tracking_service/client.py", line 762, in log_batch
[14:23:23.711]: [14:23:23.711]:     self.store.log_batch(run_id=run_id, metrics=metrics_batch, params=[], tags=[])
[14:23:23.711]: [14:23:23.711]:   File "/root/miniconda3/lib/python3.10/site-packages/mlflow/store/tracking/rest_store.py", line 539, in log_batch
[14:23:23.711]: [14:23:23.711]:     self._call_endpoint(LogBatch, req_body)
[14:23:23.711]: [14:23:23.711]:   File "/root/miniconda3/lib/python3.10/site-packages/mlflow/store/tracking/rest_store.py", line 82, in _call_endpoint
[14:23:23.711]: [14:23:23.711]:     return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
[14:23:23.711]: [14:23:23.711]:   File "/root/miniconda3/lib/python3.10/site-packages/mlflow/utils/rest_utils.py", line 370, in call_endpoint
[14:23:23.711]: [14:23:23.711]:     response = verify_rest_response(response, endpoint)
[14:23:23.711]: [14:23:23.711]:   File "/root/miniconda3/lib/python3.10/site-packages/mlflow/utils/rest_utils.py", line 240, in verify_rest_response
[14:23:23.711]: [14:23:23.711]:     raise RestException(json.loads(response.text))
[14:23:23.711]: [14:23:23.711]: mlflow.exceptions.RestException: INVALID_PARAMETER_VALUE: Invalid value "timing(s)/gen" for parameter 'metrics[39].name' supplied: Names may only contain alphanumerics, underscores (_), dashes (-), periods (.), spaces ( ) and slashes (/).
[14:23:23.711]: [14:23:23.711]: 
[14:23:23.711]: [14:23:23.711]: Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
[14:23:23.711]: [14:23:23.711]: 2025-01-20 14:23:23,710 ERR cli.py:68 -- ---------------------------------------
[14:23:23.711]: [14:23:23.711]: 2025-01-20 14:23:23,710 ERR cli.py:69 -- Job 'raysubmit_nTBZ3qfwBgHRna1k' failed
[14:23:23.711]: [14:23:23.711]: 2025-01-20 14:23:23,710 ERR cli.py:70 -- ---------------------------------------
[14:23:23.711]: [14:23:23.711]: 2025-01-20 14:23:23,710 INFO cli.py:83 -- Status message: Job entrypoint command failed with exit code 1, last available logs (truncated to 20,000 chars):
[14:23:23.711]: [14:23:23.711]:   File "/root/miniconda3/lib/python3.10/site-packages/mlflow/store/tracking/rest_store.py", line 82, in _call_endpoint
[14:23:23.711]: [14:23:23.711]:     return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
[14:23:23.711]: [14:23:23.711]:   File "/root/miniconda3/lib/python3.10/site-packages/mlflow/utils/rest_utils.py", line 370, in call_endpoint
[14:23:23.711]: [14:23:23.711]:     response = verify_rest_response(response, endpoint)
[14:23:23.711]: [14:23:23.711]:   File "/root/miniconda3/lib/python3.10/site-packages/mlflow/utils/rest_utils.py", line 240, in verify_rest_response
[14:23:23.711]: [14:23:23.711]:     raise RestException(json.loads(response.text))
[14:23:23.711]: [14:23:23.711]: mlflow.exceptions.RestException: INVALID_PARAMETER_VALUE: Invalid value "timing(s)/gen" for parameter 'metrics[39].name' supplied: Names may only contain alphanumerics, underscores (_), dashes (-), periods (.), spaces ( ) and slashes (/).

Therefore, it would be great to rename to e.g. timing_ms instead of timing(ms). I am happy to PR if this looks OK.

@fzyzcjy fzyzcjy changed the title nvalid value "timing(s)/gen" for parameter 'metrics[39].name' supplied: Names may only contain alphanumerics, underscores (_), dashes (-), periods (.), spaces ( ) and slashes (/). Invalid value "timing(s)/gen" for parameter 'metrics[39].name' supplied: Names may only contain alphanumerics, underscores (_), dashes (-), periods (.), spaces ( ) and slashes (/). Jan 20, 2025
@vermouth1992
Copy link
Collaborator

Sure. Could you please submit a PR to fix this?

@fzyzcjy
Copy link
Contributor Author

fzyzcjy commented Jan 21, 2025

Sure!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants