Learning Montezumaβs Revenge from a single demonstration
Weβve trained an agent to achieve a high score of 74,500 onΒ Montezumaβs RevengeΒ from a single human demonstration, better than any previously published result. Our algorithm is simple: the agent plays a sequence of games starting from carefully chosen states from the demonstration, and learns from t...
Log in to bookmark articles and create collections