![]() |
Functional interface for the group normalization layer.
tf.contrib.layers.group_norm(
inputs,
groups=32,
channels_axis=-1,
reduction_axes=(-3, -2),
center=True,
scale=True,
epsilon=1e-06,
activation_fn=None,
param_initializers=None,
reuse=None,
variables_collections=None,
outputs_collections=None,
trainable=True,
scope=None,
mean_close_to_zero=False
)
Reference: https://arxiv.org/abs/1803.08494.
"Group Normalization", Yuxin Wu, Kaiming He
Args:
inputs
: A Tensor with at least 2 dimensions one which is channels. All shape dimensions except for batch must be fully defined.groups
: Integer. Divide the channels into this number of groups over which normalization statistics are computed. This number must be commensurate with the number of channels ininputs
.channels_axis
: An integer. Specifies index of channels axis which will be broken intogroups
, each of which whose statistics will be computed across. Must be mutually exclusive withreduction_axes
. Preferred usage is to specify negative integers to be agnostic as to whether a batch dimension is included.reduction_axes
: Tuple of integers. Specifies dimensions over which statistics will be accumulated. Must be mutually exclusive withchannels_axis
. Statistics will not be accumulated across axes not specified inreduction_axes
norchannel_axis
. Preferred usage is to specify negative integers to be agnostic to whether a batch dimension is included.Some sample usage cases: NHWC format: channels_axis=-1, reduction_axes=[-3, -2] NCHW format: channels_axis=-3, reduction_axes=[-2, -1]
center
: If True, add offset ofbeta
to normalized tensor. If False,beta
is ignored.scale
: If True, multiply bygamma
. If False,gamma
is not used. When the next layer is linear (also e.g.nn.relu
), this can be disabled since the scaling can be done by the next layer.epsilon
: Small float added to variance to avoid dividing by zero.activation_fn
: Activation function, default set to None to skip it and maintain a linear activation.param_initializers
: Optional initializers for beta, gamma, moving mean and moving variance.reuse
: Whether or not the layer and its variables should be reused. To be able to reuse the layer scope must be given.variables_collections
: Optional collections for the variables.outputs_collections
: Collections to add the outputs.trainable
: IfTrue
also add variables to the graph collectionGraphKeys.TRAINABLE_VARIABLES
(seetf.Variable
).scope
: Optional scope forvariable_scope
.mean_close_to_zero
: The mean ofinput
before ReLU will be close to zero when batch size >= 4k for Resnet-50 on TPU. IfTrue
, usenn.sufficient_statistics
andnn.normalize_moments
to calculate the variance. This is the same behavior asfused
equalsTrue
in batch normalization. IfFalse
, usenn.moments
to calculate the variance. Whenmean
is close to zero, like 1e-4, usemean
to calculate the variance may have poor result due to repeated roundoff error and denormalization inmean
. Whenmean
is large, like 1e2, sum(input
^2) is so large that only the high-order digits of the elements are being accumulated. Thus, use sum(input
-mean
)^2/n to calculate the variance has better accuracy compared to (sum(input
^2)/n -mean
^2) whenmean
is large.
Returns:
A Tensor
representing the output of the operation.
Raises:
ValueError
: If the rank ofinputs
is undefined.ValueError
: If rank or channels dimension ofinputs
is undefined.ValueError
: If number of groups is not commensurate with number of channels.ValueError
: If reduction_axes or channels_axis are out of bounds.ValueError
: If reduction_axes are not mutually exclusive with channels_axis.