-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tensorflow control logic notes #1
Comments
|
|
It’s easier to think about LSTM as a function that maps a pair of vectors (state, input) to a pair of vectors (state, output), and those vectors generally have the same dimensionality. What actually happens inside depends on the parameters learned by the cell. When you have several cells stacked, it’s the same thing as a sequential application of several functions of the same type but with different parameters. It’s not so different from a multi-layer perceptron. The purpose of using multilayer RNN cells is to learn more sophisticated conditional distributions (such as in neural machine translation (Bahdanau et al. 2014)). In a single layer RNN, the output is produced by passing it through a single hidden state which fails to capture hierarchical (think temporal) structure of a sequence. With a multi-layered RNN, such structure is captured which results in better performance. Compare RNNs with a deep neural network (such as CNN) for image recognition. Through visualization research, we know that each layer in the network captures structure. For example the initial layers find edges in an image, or identify color of image. The later layers build upon this for complex structure such as finding intersection of edges or shades of colors. The final layer then brings all of this together to identify the object in the image. In a single layered RNN, you have one hidden state doing all the work. So it is overwhelmed. If you are modeling a sequence such as text, then the parameters are learning that ‘a’ is more likely to follow ‘c’ than ‘o’. By introducing multiple layers however, you offer the RNN to capture structure. The first layer might learn that some characters are vowels and others are consonants. The second layer would build on this to learn that a vowel is more likely to follow a consonant.
上面解释了outputs为什么会加上s. 注意其中不断地append. 注意muupan的github在Adversarial Attacks on Neural Network Policies这篇paper中有refer. github.com/muupan/async-rl
Note: 以上表达了rnn的具体实现过程。Good!
请注意以上code中的[BATCH, HIDDEN_SIZE]. 这样便可明白! |
The text was updated successfully, but these errors were encountered: