description: Library for running a computation across multiple devices.

Module: tf.compat.v1.distribute

Library for running a computation across multiple devices.

The intent of this library is that you can write an algorithm in a stylized way and it will be usable with a variety of different tf.distribute.Strategy implementations. Each descendant will implement a different strategy for distributing the algorithm across multiple devices/machines. Furthermore, these changes can be hidden inside the specific layers and other library classes that need special treatment to run in a distributed setting, so that most users' model definition code can run unchanged. The tf.distribute.Strategy API works the same way with eager and graph execution.

Guides

Tutorials

Glossary

Note that we provide a default version of tf.distribute.Strategy that is used when no other strategy is in scope, that provides the same API with reasonable default behavior.

Modules

cluster_resolver module: Library imports for ClusterResolvers.

experimental module: Public API for tf.distribute.experimental namespace.

Classes

class CrossDeviceOps: Base class for cross-device reduction and broadcasting algorithms.

class HierarchicalCopyAllReduce: Hierarchical copy all-reduce implementation of CrossDeviceOps.

class InputContext: A class wrapping information needed by an input function.

class InputReplicationMode: Replication mode for input function.

class MirroredStrategy: Synchronous training across multiple replicas on one machine.

class NcclAllReduce: NCCL all-reduce implementation of CrossDeviceOps.

class OneDeviceStrategy: A distribution strategy for running on a single device.

class ReduceOp: Indicates how a set of values should be reduced.

class ReductionToOneDevice: A CrossDeviceOps implementation that copies values to one device to reduce.

class ReplicaContext: A class with a collection of APIs that can be called in a replica context.

class RunOptions: Run options for strategy.run.

class Server: An in-process TensorFlow server, for use in distributed training.

class Strategy: A list of devices with a state & compute distribution policy.

class StrategyExtended: Additional APIs for algorithms that need to be distribution-aware.

Functions

experimental_set_strategy(...): Set a tf.distribute.Strategy as current without with strategy.scope().

get_loss_reduction(...): tf.distribute.ReduceOp corresponding to the last loss reduction.

get_replica_context(...): Returns the current tf.distribute.ReplicaContext or None.

get_strategy(...): Returns the current tf.distribute.Strategy object.

has_strategy(...): Return if there is a current non-default tf.distribute.Strategy.

in_cross_replica_context(...): Returns True if in a cross-replica context.