Why tanh activation functions lead to exploding, not vanishing, gradients and other things your deep learning textbook probably got wrong
Myths about exploding gradients
Why tanh activation functions lead to exploding, not vanishing, gradients and other things your deep learning textbook probably got wrong