智能体学习如何直接刺激其奖励机制,而不是解决其预期任务。在关闭/关闭问题中,智能体会干扰其监督者停止智能体操作的能力。这两个问题有一个共同的代理-监督者破坏了监督者对任务的反馈。我们将此称为篡改问题:当用于描述目标的所有反馈机制均受智能体影响时,我们如何设计追求给定目标的智能体?
智能体学习如何直接刺激其奖励机制,而不是解决其预期任务。在关闭/关闭问题中,智能体会干扰其监督者停止智能体操作的能力。这两个问题有一个共同的代理-监督者破坏了监督者对任务的反馈。我们将此称为篡改问题:当用于描述目标的所有反馈机制均受智能体影响时,我们如何设计追求给定目标的智能体?
A Reasonable Theology for Our Time
What if we understood more things?
A Research Blog
Computing with Meaning and Values
LASP - Learning And Signal Processing
Just another WordPress.com site
Ph.D. Candidate at Stanford
Massively Collaborative Theoretical Computer Science Projects
Philosophy, Mathematics, and Logic
by Jessica Taylor
Updates on my research and expository papers, discussion of open problems, and other maths-related topics. By Terence Tao
Random things about software development, machine learning and image processing research.
Just another WordPress.com weblog
Looking askance at reality