1op { 2 graph_op_name: "StopGradient" 3 summary: "Stops gradient computation." 4 description: <<END 5When executed in a graph, this op outputs its input tensor as-is. 6 7When building ops to compute gradients, this op prevents the contribution of 8its inputs to be taken into account. Normally, the gradient generator adds ops 9to a graph to compute the derivatives of a specified 'loss' by recursively 10finding out inputs that contributed to its computation. If you insert this op 11in the graph it inputs are masked from the gradient generator. They are not 12taken into account for computing gradients. 13 14This is useful any time you want to compute a value with TensorFlow but need 15to pretend that the value was a constant. For example, the softmax function 16for a vector x can be written as 17 18```python 19 20 def softmax(x): 21 numerator = tf.exp(x) 22 denominator = tf.reduce_sum(numerator) 23 return numerator / denominator 24``` 25 26This however is susceptible to overflow if the values in x are large. An 27alternative more stable way is to subtract the maximum of x from each of the 28values. 29 30```python 31 32 def stable_softmax(x): 33 z = x - tf.reduce_max(x) 34 numerator = tf.exp(z) 35 denominator = tf.reduce_sum(numerator) 36 return numerator / denominator 37``` 38 39However, when we backprop through the softmax to x, we dont want to backprop 40through the `tf.reduce_max(x)` (if the max values are not unique then the 41gradient could flow to the wrong input) calculation and treat that as a 42constant. Therefore, we should write this out as 43 44```python 45 46 def stable_softmax(x): 47 z = x - tf.stop_gradient(tf.reduce_max(x)) 48 numerator = tf.exp(z) 49 denominator = tf.reduce_sum(numerator) 50 return numerator / denominator 51``` 52 53Some other examples include: 54 55* The *EM* algorithm where the *M-step* should not involve backpropagation 56 through the output of the *E-step*. 57* Contrastive divergence training of Boltzmann machines where, when 58 differentiating the energy function, the training must not backpropagate 59 through the graph that generated the samples from the model. 60* Adversarial training, where no backprop should happen through the adversarial 61 example generation process. 62END 63} 64