You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Invalid value "timing(s)/gen" for parameter 'metrics[39].name' supplied: Names may only contain alphanumerics, underscores (_), dashes (-), periods (.), spaces ( ) and slashes (/).
#119
Closed
fzyzcjy opened this issue
Jan 20, 2025
· 2 comments
· Fixed by #120
When upgraded to latest master verl, it seems #111 breaks the mlflow logging. It errors as follows:
[14:23:23.711]: [14:23:23.711]: File "/host_home/research/code/third_party/verl/verl/trainer/main_ppo.py", line 185, in main_task
[14:23:23.711]: [14:23:23.711]: trainer.fit()
[14:23:23.711]: [14:23:23.711]: File "/host_home/research/code/third_party/verl/verl/trainer/ppo/ray_trainer.py", line 633, in fit
[14:23:23.711]: [14:23:23.711]: logger.log(data=metrics, step=self.global_steps)
[14:23:23.711]: [14:23:23.711]: File "/host_home/research/code/third_party/verl/verl/utils/tracking.py", line 58, in log
[14:23:23.711]: [14:23:23.711]: logger_instance.log(data=data, step=step)
[14:23:23.711]: [14:23:23.711]: File "/host_home/research/code/third_party/verl/verl/utils/tracking.py", line 65, in log
[14:23:23.711]: [14:23:23.711]: mlflow.log_metrics(metrics=data, step=step)
[14:23:23.711]: [14:23:23.711]: File "/root/miniconda3/lib/python3.10/site-packages/mlflow/tracking/fluent.py", line 984, in log_metrics
[14:23:23.711]: [14:23:23.711]: return MlflowClient().log_batch(
[14:23:23.711]: [14:23:23.711]: File "/root/miniconda3/lib/python3.10/site-packages/mlflow/tracking/client.py", line 1868, in log_batch
[14:23:23.711]: [14:23:23.711]: return self._tracking_client.log_batch(
[14:23:23.711]: [14:23:23.711]: File "/root/miniconda3/lib/python3.10/site-packages/mlflow/tracking/_tracking_service/client.py", line 762, in log_batch
[14:23:23.711]: [14:23:23.711]: self.store.log_batch(run_id=run_id, metrics=metrics_batch, params=[], tags=[])
[14:23:23.711]: [14:23:23.711]: File "/root/miniconda3/lib/python3.10/site-packages/mlflow/store/tracking/rest_store.py", line 539, in log_batch
[14:23:23.711]: [14:23:23.711]: self._call_endpoint(LogBatch, req_body)
[14:23:23.711]: [14:23:23.711]: File "/root/miniconda3/lib/python3.10/site-packages/mlflow/store/tracking/rest_store.py", line 82, in _call_endpoint
[14:23:23.711]: [14:23:23.711]: return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
[14:23:23.711]: [14:23:23.711]: File "/root/miniconda3/lib/python3.10/site-packages/mlflow/utils/rest_utils.py", line 370, in call_endpoint
[14:23:23.711]: [14:23:23.711]: response = verify_rest_response(response, endpoint)
[14:23:23.711]: [14:23:23.711]: File "/root/miniconda3/lib/python3.10/site-packages/mlflow/utils/rest_utils.py", line 240, in verify_rest_response
[14:23:23.711]: [14:23:23.711]: raise RestException(json.loads(response.text))
[14:23:23.711]: [14:23:23.711]: mlflow.exceptions.RestException: INVALID_PARAMETER_VALUE: Invalid value "timing(s)/gen" for parameter 'metrics[39].name' supplied: Names may only contain alphanumerics, underscores (_), dashes (-), periods (.), spaces ( ) and slashes (/).
[14:23:23.711]: [14:23:23.711]:
[14:23:23.711]: [14:23:23.711]: Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
[14:23:23.711]: [14:23:23.711]: 2025-01-20 14:23:23,710 ERR cli.py:68 -- ---------------------------------------
[14:23:23.711]: [14:23:23.711]: 2025-01-20 14:23:23,710 ERR cli.py:69 -- Job 'raysubmit_nTBZ3qfwBgHRna1k' failed
[14:23:23.711]: [14:23:23.711]: 2025-01-20 14:23:23,710 ERR cli.py:70 -- ---------------------------------------
[14:23:23.711]: [14:23:23.711]: 2025-01-20 14:23:23,710 INFO cli.py:83 -- Status message: Job entrypoint command failed with exit code 1, last available logs (truncated to 20,000 chars):
[14:23:23.711]: [14:23:23.711]: File "/root/miniconda3/lib/python3.10/site-packages/mlflow/store/tracking/rest_store.py", line 82, in _call_endpoint
[14:23:23.711]: [14:23:23.711]: return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
[14:23:23.711]: [14:23:23.711]: File "/root/miniconda3/lib/python3.10/site-packages/mlflow/utils/rest_utils.py", line 370, in call_endpoint
[14:23:23.711]: [14:23:23.711]: response = verify_rest_response(response, endpoint)
[14:23:23.711]: [14:23:23.711]: File "/root/miniconda3/lib/python3.10/site-packages/mlflow/utils/rest_utils.py", line 240, in verify_rest_response
[14:23:23.711]: [14:23:23.711]: raise RestException(json.loads(response.text))
[14:23:23.711]: [14:23:23.711]: mlflow.exceptions.RestException: INVALID_PARAMETER_VALUE: Invalid value "timing(s)/gen" for parameter 'metrics[39].name' supplied: Names may only contain alphanumerics, underscores (_), dashes (-), periods (.), spaces ( ) and slashes (/).
Therefore, it would be great to rename to e.g. timing_ms instead of timing(ms). I am happy to PR if this looks OK.
The text was updated successfully, but these errors were encountered:
fzyzcjy
changed the title
nvalid value "timing(s)/gen" for parameter 'metrics[39].name' supplied: Names may only contain alphanumerics, underscores (_), dashes (-), periods (.), spaces ( ) and slashes (/).
Invalid value "timing(s)/gen" for parameter 'metrics[39].name' supplied: Names may only contain alphanumerics, underscores (_), dashes (-), periods (.), spaces ( ) and slashes (/).
Jan 20, 2025
When upgraded to latest master verl, it seems #111 breaks the mlflow logging. It errors as follows:
Therefore, it would be great to rename to e.g.
timing_ms
instead oftiming(ms)
. I am happy to PR if this looks OK.The text was updated successfully, but these errors were encountered: