NetHack, which was first released in 1987, is more sophisticated than might be assumed. It tasks players with descending more than 50 dungeon levels to retrieve a magical amulet, during which they must use hundreds of items and fight monsters while contending with rich interactions between the two. Levels in NetHack are procedurally generated and every game is different, which the Facebook researchers note tests the generalization limits of current state-of-the-art AI.
Facebook researchers believe the game NetHack is well-tailored to training, testing, and evaluating AI models. Today, they released the NetHack Learning Environment, a research tool for benchmarking the robustness and generalization of reinforcement learning agents.
For decades, games have served as benchmarks for AI. But things really kicked into gear in 2013 — the year Google subsidiary DeepMind demonstrated an AI system that could play Pong, Breakout, Space Invaders, Seaquest, Beamrider, Enduro, and Q*bert at superhuman levels. The advancements aren’t merely improving game design, according to folks like DeepMind cofounder Demis Hassabis. Rather, they’re informing the development of systems that might one day diagnose illnesses, predict complicated protein structures, and segment CT scans.