description: Saves the content of the given dataset.
![]() |
Saves the content of the given dataset.
tf.data.experimental.save(
dataset, path, compression=None, shard_func=None
)
>>> import tempfile
>>> path = os.path.join(tempfile.gettempdir(), "saved_data")
>>> # Save a dataset
>>> dataset = tf.data.Dataset.range(2)
>>> tf.data.experimental.save(dataset, path)
>>> new_dataset = tf.data.experimental.load(path,
... tf.TensorSpec(shape=(), dtype=tf.int64))
>>> for elem in new_dataset:
... print(elem)
tf.Tensor(0, shape=(), dtype=int64)
tf.Tensor(1, shape=(), dtype=int64)
The saved dataset is saved in multiple file "shards". By default, the dataset
output is divided to shards in a round-robin fashion but custom sharding can
be specified via the shard_func
function. For example, you can save the
dataset to using a single shard as follows:
dataset = make_dataset()
def custom_shard_func(element):
return 0
dataset = tf.data.experimental.save(
path="/path/to/data", ..., shard_func=custom_shard_func)
NOTE: The directory layout and file format used for saving the dataset is
considered an implementation detail and may change. For this reason, datasets
saved through tf.data.experimental.save
should only be consumed through
tf.data.experimental.load
, which is guaranteed to be backwards compatible.
Args | |
---|---|
dataset
|
The dataset to save. |
path
|
Required. A directory to use for saving the dataset. |
compression
|
Optional. The algorithm to use to compress data when writing
it. Supported options are GZIP and NONE . Defaults to NONE .
|
shard_func
|
Optional. A function to control the mapping of dataset elements to file shards. The function is expected to map elements of the input dataset to int64 shard IDs. If present, the function will be traced and executed as graph computation. |