Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using array_record on Dataflow leads to "No such file" errors #10981

Open
carlthome opened this issue Jan 17, 2025 · 0 comments
Open

Using array_record on Dataflow leads to "No such file" errors #10981

carlthome opened this issue Jan 17, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@carlthome
Copy link
Contributor

I'm trying tfds build --file_format=array_record --beam_pipeline_options=runner=DataflowRunner and noticed that the file writer doesn't seem to understand Google Storage.

INFO[dataflow_runner.py]: ...: JOB_MESSAGE_BASIC: Executing operation train_write/WriteFinalShards+train_write/CollectShardInfo/CollectShardInfo/KeyWithVoid+train_write/CollectShardInfo/CollectShardInfo/CombinePerKey/GroupByKey+train_write/CollectShardInfo/CollectShardInfo/CombinePerKey/Combine/Partial+train_write/CollectShardInfo/CollectShardInfo/CombinePerKey/GroupByKey/Write

ERROR[dataflow_runner.py]: ...: JOB_MESSAGE_ERROR: Traceback (most recent call last):

return file_adapters.ADAPTER_FOR_FORMAT[file_format].write_examples(...)

writer = array_record_module.ArrayRecordWriter(...)

RuntimeError: open() failed: No such file or directory; opening gs://.../1.0.0.incomplete8FTSMC/...-train.array_record-00003-of-00004.incomplete
@carlthome carlthome added the bug Something isn't working label Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant