Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adversarial auxiliary signals #13

Open
deontologician opened this issue Dec 19, 2017 · 0 comments
Open

Adversarial auxiliary signals #13

deontologician opened this issue Dec 19, 2017 · 0 comments

Comments

@deontologician
Copy link
Contributor

Create two separate networks that compete to explore the environment (together they form 1 agent)
Idea is to have a reinforcement learning setup where:

  • The prediction network learns an unsupervised representation of the environment, and predicts what will happen next
    • We could use adversarial techniques for unsupervised learning, or we could use something less fancy like denoising autoencoders
  • The exploration network controls the actions of the agent, and gets a reward proportional to the MSE of the prediction network's prediction and reality
    • This is an artificial reward signal, not tied to the true environment reward

The exploration network has no backprop into the weights of the prediction network, so it can't suggest degenerate representations (e.g. learning to output random noise to maximize surprise).

Influence is solely through the actions of the exploration network causing mispredictions. e.g. reality is always in between the exploration network and the prediction network

Considerations:

  • The exploration network needs to quickly adapt to changing dynamics (model this like a multi-arm bandit that periodically changes the payout probabilities of the arms). Things like RL^2 are probably a good idea here.
  • The inputs to the exploration network might need to be the raw input, and maybe some memory like an LSTM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant