Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to load MNIST dataset on Colab, running TPU v2.8 #10976

Open
jeromemassot opened this issue Jan 14, 2025 · 1 comment
Open

Unable to load MNIST dataset on Colab, running TPU v2.8 #10976

jeromemassot opened this issue Jan 14, 2025 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@jeromemassot
Copy link

Short description
Unable to load MNIST dataset on Colab, running TPU v2.8

Environment information

  • Operating System: Colab

  • Python version: 3.10.12

  • tensorflow-datasets/tfds-nightly version: 4.9.7

  • tensorflow/tf-nightly version: 2.15.0

  • JAX version: 0.4.33

  • Does the issue still exists with the last tfds-nightly package (pip install --upgrade tfds-nightly) ?

Reproduction instructions

data_dir = '/tmp/tfds'
mnist_data, info = tfds.load(name="mnist", batch_size=-1, data_dir=data_dir, with_info=True)
mnist_data = tfds.as_numpy(mnist_data)
data_train, data_test = mnist_data['train'], mnist_data['test']

Logs

TypeError                                 Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/tensorflow/python/data/util/structure.py](https://localhost:8080/#) in normalize_element(element, element_signature)
    104         if spec is None:
--> 105           spec = type_spec_from_value(t, use_fallback=False)
    106       except TypeError:

[/usr/local/lib/python3.10/dist-packages/tensorflow/python/data/util/structure.py](https://localhost:8080/#) in type_spec_from_value(element, use_fallback)
    513 
--> 514   raise TypeError("Could not build a `TypeSpec` for {} with type {}".format(
    515       element,

TypeError: Could not build a `TypeSpec` for ['gs://tfds-data/datasets/mnist/3.0.1/mnist-test.tfrecord-00000-of-00001'] with type list

During handling of the above exception, another exception occurred:

FailedPreconditionError                   Traceback (most recent call last)
[<ipython-input-16-6a5f245a8c89>](https://localhost:8080/#) in <cell line: 1>()
----> 1 mnist_data, info = tfds.load(name="mnist", batch_size=-1, with_info=True, try_gcs=True)
      2 mnist_data = tfds.as_numpy(mnist_data)
      3 data_train, data_test = mnist_data['train'], mnist_data['test']

[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/logging/__init__.py](https://localhost:8080/#) in __call__(self, function, instance, args, kwargs)
    174     metadata = self._start_call()
    175     try:
--> 176       return function(*args, **kwargs)
    177     except Exception:
    178       metadata.mark_error()

[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/load.py](https://localhost:8080/#) in load(name, split, data_dir, batch_size, shuffle_files, download, as_supervised, decoders, read_config, with_info, builder_kwargs, download_and_prepare_kwargs, as_dataset_kwargs, try_gcs)
    671   as_dataset_kwargs.setdefault('read_config', read_config)
    672 
--> 673   ds = dbuilder.as_dataset(**as_dataset_kwargs)
    674   if with_info:
    675     return ds, dbuilder.info

[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/logging/__init__.py](https://localhost:8080/#) in __call__(self, function, instance, args, kwargs)
    174     metadata = self._start_call()
    175     try:
--> 176       return function(*args, **kwargs)
    177     except Exception:
    178       metadata.mark_error()

[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/dataset_builder.py](https://localhost:8080/#) in as_dataset(self, split, batch_size, shuffle_files, decoders, read_config, as_supervised)
   1024         as_supervised=as_supervised,
   1025     )
-> 1026     all_ds = tree.map_structure(build_single_dataset, split)
   1027     return all_ds
   1028 

[/usr/local/lib/python3.10/dist-packages/tree/__init__.py](https://localhost:8080/#) in map_structure(func, *structures, **kwargs)
    433     assert_same_structure(structures[0], other, check_types=check_types)
    434   return unflatten_as(structures[0],
--> 435                       [func(*args) for args in zip(*map(flatten, structures))])
    436 
    437 

[/usr/local/lib/python3.10/dist-packages/tree/__init__.py](https://localhost:8080/#) in <listcomp>(.0)
    433     assert_same_structure(structures[0], other, check_types=check_types)
    434   return unflatten_as(structures[0],
--> 435                       [func(*args) for args in zip(*map(flatten, structures))])
    436 
    437 

[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/dataset_builder.py](https://localhost:8080/#) in _build_single_dataset(self, split, batch_size, shuffle_files, decoders, read_config, as_supervised)
   1042 
   1043     # Build base dataset
-> 1044     ds = self._as_dataset(
   1045         split=split,
   1046         shuffle_files=shuffle_files,

[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/dataset_builder.py](https://localhost:8080/#) in _as_dataset(self, split, decoders, read_config, shuffle_files)
   1496     )
   1497     decode_fn = functools.partial(features.decode_example, decoders=decoders)
-> 1498     return reader.read(
   1499         instructions=split,
   1500         split_infos=self.info.splits.values(),

[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/reader.py](https://localhost:8080/#) in read(self, instructions, split_infos, read_config, shuffle_files, disable_shuffling, decode_fn)
    428       )
    429 
--> 430     return tree.map_structure(_read_instruction_to_ds, instructions)
    431 
    432   def read_files(

[/usr/local/lib/python3.10/dist-packages/tree/__init__.py](https://localhost:8080/#) in map_structure(func, *structures, **kwargs)
    433     assert_same_structure(structures[0], other, check_types=check_types)
    434   return unflatten_as(structures[0],
--> 435                       [func(*args) for args in zip(*map(flatten, structures))])
    436 
    437 

[/usr/local/lib/python3.10/dist-packages/tree/__init__.py](https://localhost:8080/#) in <listcomp>(.0)
    433     assert_same_structure(structures[0], other, check_types=check_types)
    434   return unflatten_as(structures[0],
--> 435                       [func(*args) for args in zip(*map(flatten, structures))])
    436 
    437 

[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/reader.py](https://localhost:8080/#) in _read_instruction_to_ds(instruction)
    420     def _read_instruction_to_ds(instruction):
    421       file_instructions = splits_dict[instruction].file_instructions
--> 422       return self.read_files(
    423           file_instructions,
    424           read_config=read_config,

[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/reader.py](https://localhost:8080/#) in read_files(self, file_instructions, read_config, shuffle_files, disable_shuffling, decode_fn)
    460 
    461     # Read serialized example (eventually with `tfds_id`)
--> 462     ds = _read_files(
    463         file_instructions=file_instructions,
    464         read_config=read_config,

[/usr/local/lib/python3.10/dist-packages/tensorflow_datasets/core/reader.py](https://localhost:8080/#) in _read_files(file_instructions, read_config, shuffle_files, disable_shuffling, file_format)
    265   )
    266 
--> 267   instruction_ds = tf.data.Dataset.from_tensor_slices(tensor_inputs)
    268 
    269   # On distributed environments, we can shard per-file if a

[/usr/local/lib/python3.10/dist-packages/tensorflow/python/data/ops/dataset_ops.py](https://localhost:8080/#) in from_tensor_slices(tensors, name)
    823     # pylint: disable=g-import-not-at-top,protected-access
    824     from tensorflow.python.data.ops import from_tensor_slices_op
--> 825     return from_tensor_slices_op._from_tensor_slices(tensors, name)
    826     # pylint: enable=g-import-not-at-top,protected-access
    827 

[/usr/local/lib/python3.10/dist-packages/tensorflow/python/data/ops/from_tensor_slices_op.py](https://localhost:8080/#) in _from_tensor_slices(tensors, name)
     23 
     24 def _from_tensor_slices(tensors, name=None):
---> 25   return _TensorSliceDataset(tensors, name=name)
     26 
     27 

[/usr/local/lib/python3.10/dist-packages/tensorflow/python/data/ops/from_tensor_slices_op.py](https://localhost:8080/#) in __init__(self, element, is_files, name)
     31   def __init__(self, element, is_files=False, name=None):
     32     """See `Dataset.from_tensor_slices` for details."""
---> 33     element = structure.normalize_element(element)
     34     batched_spec = structure.type_spec_from_value(element)
     35     self._tensors = structure.to_batched_tensor_list(batched_spec, element)

[/usr/local/lib/python3.10/dist-packages/tensorflow/python/data/util/structure.py](https://localhost:8080/#) in normalize_element(element, element_signature)
    108         # the value. As a fallback try converting the value to a tensor.
    109         normalized_components.append(
--> 110             ops.convert_to_tensor(t, name="component_%d" % i))
    111       else:
    112         # To avoid a circular dependency between dataset_ops and structure,

[/usr/local/lib/python3.10/dist-packages/tensorflow/python/profiler/trace.py](https://localhost:8080/#) in wrapped(*args, **kwargs)
    181         with Trace(trace_name, **trace_kwargs):
    182           return func(*args, **kwargs)
--> 183       return func(*args, **kwargs)
    184 
    185     return wrapped

[/usr/local/lib/python3.10/dist-packages/tensorflow/python/framework/ops.py](https://localhost:8080/#) in convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, dtype_hint, ctx, accepted_result_types)
    694   # TODO(b/142518781): Fix all call-sites and remove redundant arg
    695   preferred_dtype = preferred_dtype or dtype_hint
--> 696   return tensor_conversion_registry.convert(
    697       value, dtype, name, as_ref, preferred_dtype, accepted_result_types
    698   )

[/usr/local/lib/python3.10/dist-packages/tensorflow/python/framework/tensor_conversion_registry.py](https://localhost:8080/#) in convert(value, dtype, name, as_ref, preferred_dtype, accepted_result_types)
    232 
    233     if ret is None:
--> 234       ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
    235 
    236     if ret is NotImplemented:

[/usr/local/lib/python3.10/dist-packages/tensorflow/python/framework/constant_op.py](https://localhost:8080/#) in _constant_tensor_conversion_function(v, dtype, name, as_ref)
    333                                          as_ref=False):
    334   _ = as_ref
--> 335   return constant(v, dtype=dtype, name=name)
    336 
    337 # Register the conversion function for the "unconvertible" types

[/usr/local/lib/python3.10/dist-packages/tensorflow/python/ops/weak_tensor_ops.py](https://localhost:8080/#) in wrapper(*args, **kwargs)
    140   def wrapper(*args, **kwargs):
    141     if not ops.is_auto_dtype_conversion_enabled():
--> 142       return op(*args, **kwargs)
    143     bound_arguments = signature.bind(*args, **kwargs)
    144     bound_arguments.apply_defaults()

[/usr/local/lib/python3.10/dist-packages/tensorflow/python/framework/constant_op.py](https://localhost:8080/#) in constant(value, dtype, shape, name)
    269     ValueError: if called on a symbolic tensor.
    270   """
--> 271   return _constant_impl(value, dtype, shape, name, verify_shape=False,
    272                         allow_broadcast=True)
    273 

[/usr/local/lib/python3.10/dist-packages/tensorflow/python/framework/constant_op.py](https://localhost:8080/#) in _constant_impl(value, dtype, shape, name, verify_shape, allow_broadcast)
    282       with trace.Trace("tf.constant"):
    283         return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
--> 284     return _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
    285 
    286   const_tensor = ops._create_graph_constant(  # pylint: disable=protected-access

[/usr/local/lib/python3.10/dist-packages/tensorflow/python/framework/constant_op.py](https://localhost:8080/#) in _constant_eager_impl(ctx, value, dtype, shape, verify_shape)
    294 ) -> ops._EagerTensorBase:
    295   """Creates a constant on the current device."""
--> 296   t = convert_to_eager_tensor(value, ctx, dtype)
    297   if shape is None:
    298     return t

[/usr/local/lib/python3.10/dist-packages/tensorflow/python/framework/constant_op.py](https://localhost:8080/#) in convert_to_eager_tensor(value, ctx, dtype)
    100     except AttributeError:
    101       dtype = dtypes.as_dtype(dtype).as_datatype_enum
--> 102   ctx.ensure_initialized()
    103   return ops.EagerTensor(value, ctx.device_name, dtype)
    104 

[/usr/local/lib/python3.10/dist-packages/tensorflow/python/eager/context.py](https://localhost:8080/#) in ensure_initialized(self)
    601         pywrap_tfe.TFE_ContextOptionsSetJitCompileRewrite(
    602             opts, self._jit_compile_rewrite)
--> 603         context_handle = pywrap_tfe.TFE_NewContext(opts)
    604       finally:
    605         pywrap_tfe.TFE_DeleteContextOptions(opts)

FailedPreconditionError: ioctl failed; [0000:00:06.0 PE0 C2 MC-1 TN0] Failed to set number of simple DMA addresses

Expected behavior
MNIST dataset to be loaded.

@jeromemassot jeromemassot added the bug Something isn't working label Jan 14, 2025
@fineguy fineguy self-assigned this Jan 20, 2025
@fineguy
Copy link
Collaborator

fineguy commented Jan 20, 2025

The error is perhaps linked to batch_size=-1, which means that you're loading all data at once. Could you try some smaller values?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants