![]() |
Class SGD
Stochastic gradient descent and momentum optimizer.
Inherits From: Optimizer
Aliases:
- Class
tf.compat.v1.keras.optimizers.SGD
- Class
tf.compat.v2.keras.optimizers.SGD
- Class
tf.compat.v2.optimizers.SGD
Computes:
theta(t+1) = theta(t) - learning_rate * gradient
gradient is evaluated at theta(t).
or Computes (if nesterov = False
):
v(t+1) = momentum * v(t) - learning_rate * gradient
theta(t+1) = theta(t) + v(t+1)
if `nesterov` is False, gradient is evaluated at theta(t).
if `nesterov` is True, gradient is evaluated at theta(t) + momentum * v(t),
and the variables always store theta + m v instead of theta
Some of the args below are hyperparameters, where a hyperparameter is
defined as a scalar Tensor, a regular Python value, or a callable (which
will be evaluated when apply_gradients
is called) returning a scalar
Tensor or a Python value.
References
nesterov = True, See [Sutskever et al., 2013](
http://jmlr.org/proceedings/papers/v28/sutskever13.pdf).
Eager Compatibility
When eager execution is enabled, learning_rate can be a callable that takes no arguments and returns the actual value to use. This can be useful for changing these values across different invocations of optimizer functions.
__init__
__init__(
learning_rate=0.01,
momentum=0.0,
nesterov=False,
name='SGD',
**kwargs
)
Construct a new Stochastic Gradient Descent or Momentum optimizer.
Arguments:
learning_rate
: float hyperparameter >= 0. Learning rate.momentum
: float hyperparameter >= 0 that accelerates SGD in the relevant direction and dampens oscillations.nesterov
: boolean. Whether to apply Nesterov momentum.name
: Optional name prefix for the operations created when applying gradients. Defaults to 'SGD'.**kwargs
: keyword arguments. Allowed to be {clipnorm
,clipvalue
,lr
,decay
}.clipnorm
is clip gradients by norm;clipvalue
is clip gradients by value,decay
is included for backward compatibility to allow time inverse decay of learning rate.lr
is included for backward compatibility, recommended to uselearning_rate
instead.
Properties
iterations
Variable. The number of training steps this Optimizer has run.
weights
Returns variables of this Optimizer based on the order created.
Methods
tf.keras.optimizers.SGD.add_slot
add_slot(
var,
slot_name,
initializer='zeros'
)
Add a new slot variable for var
.
tf.keras.optimizers.SGD.add_weight
add_weight(
name,
shape,
dtype=None,
initializer='zeros',
trainable=None,
synchronization=tf.VariableSynchronization.AUTO,
aggregation=tf.VariableAggregation.NONE
)
tf.keras.optimizers.SGD.apply_gradients
apply_gradients(
grads_and_vars,
name=None
)
Apply gradients to variables.
This is the second part of minimize()
. It returns an Operation
that
applies gradients.
Args:
grads_and_vars
: List of (gradient, variable) pairs.name
: Optional name for the returned operation. Default to the name passed to theOptimizer
constructor.
Returns:
An Operation
that applies the specified gradients. The iterations
will be automatically increased by 1.
Raises:
TypeError
: Ifgrads_and_vars
is malformed.ValueError
: If none of the variables have gradients.
tf.keras.optimizers.SGD.from_config
from_config(
cls,
config,
custom_objects=None
)
Creates an optimizer from its config.
This method is the reverse of get_config
,
capable of instantiating the same optimizer from the config
dictionary.
Arguments:
config
: A Python dictionary, typically the output of get_config.custom_objects
: A Python dictionary mapping names to additional Python objects used to create this optimizer, such as a function used for a hyperparameter.
Returns:
An optimizer instance.
tf.keras.optimizers.SGD.get_config
get_config()
Returns the config of the optimimizer.
An optimizer config is a Python dictionary (serializable) containing the configuration of an optimizer. The same optimizer can be reinstantiated later (without any saved state) from this configuration.
Returns:
Python dictionary.
tf.keras.optimizers.SGD.get_gradients
get_gradients(
loss,
params
)
Returns gradients of loss
with respect to params
.
Arguments:
loss
: Loss tensor.params
: List of variables.
Returns:
List of gradient tensors.
Raises:
ValueError
: In case any gradient cannot be computed (e.g. if gradient function not implemented).
tf.keras.optimizers.SGD.get_slot
get_slot(
var,
slot_name
)
tf.keras.optimizers.SGD.get_slot_names
get_slot_names()
A list of names for this optimizer's slots.
tf.keras.optimizers.SGD.get_updates
get_updates(
loss,
params
)
tf.keras.optimizers.SGD.get_weights
get_weights()
tf.keras.optimizers.SGD.minimize
minimize(
loss,
var_list,
grad_loss=None,
name=None
)
Minimize loss
by updating var_list
.
This method simply computes gradient using tf.GradientTape
and calls
apply_gradients()
. If you want to process the gradient before applying
then call tf.GradientTape
and apply_gradients()
explicitly instead
of using this function.
Args:
loss
: A callable taking no arguments which returns the value to minimize.var_list
: list or tuple ofVariable
objects to update to minimizeloss
, or a callable returning the list or tuple ofVariable
objects. Use callable when the variable list would otherwise be incomplete beforeminimize
since the variables are created at the first timeloss
is called.grad_loss
: Optional. ATensor
holding the gradient computed forloss
.name
: Optional name for the returned operation.
Returns:
An Operation that updates the variables in var_list
. If global_step
was not None
, that operation also increments global_step
.
Raises:
ValueError
: If some of the variables are notVariable
objects.
tf.keras.optimizers.SGD.set_weights
set_weights(weights)
tf.keras.optimizers.SGD.variables
variables()
Returns variables of this Optimizer based on the order created.