http://www.flickr.com/photos/brentschoepf/5683040980/sizes/l/in/photostream/
@archillect
Memory-based meta-learning is a powerful technique to build agents that adapt fast to any task within a target distribution. A previous theoretical study has argued that this remarkable performance is because the meta-training protocol incentivises agents to behave Bayes-optimally. We empirically investigate this claim on a number of prediction and bandit tasks. Inspired by ideas from theoretical computer science, we show that meta-learned and Bayes-optimal agents not only behave alike, but they even share a similar computational structure, in the sense that one agent system can approximately simulate the other. Furthermore, we show that Bayes-optimal agents are fixed points of the meta-learning dynamics. Our results suggest that memory-based meta-learning might serve as a general technique for numerically approximating Bayes-optimal agents—that is, even for task distributions for which we currently don’t possess tractable models.

来自 DeepMind 的工作从实验角度验证了此前的一个想法:元式训练的协议激励智能体贝叶斯最优地行动

使用了理论计算机的想法,作者展示了元式训练和贝叶斯最优的智能体不仅行为类似,而且共享了一个相似的计算结构,即一个智能体系统可以近似模拟另一个。

实际上贝叶斯最优智能体是元式学习动力系统的不动点。

结论:基于记忆的元式学习可以成为数值近似贝叶斯最优智能体的通用技术——甚至针对那些任务分布我们还没有可行模型的情况

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s