A Baseline for Any Order Gradient Estimation in Stochastic Computation Graphs
Authors: Jingkai Mao, Jakob Foerster, Tim Rocktaschel, Maruan Al-Shedivat 4 Gregory Farquhar, Shimon Whiteson
Abstract: By enabling correct differentiation in stochastic computation graphs (SCGs), the infinitely differentiable Monte-Carlo estimator (DiCE) can generate correct estimates for the higher order gradients that arise in, e.g., multi-agent reinforcement learning and meta-learning.
However, the baseline term in DiCE that serves as a control variate for reducing variance applies only to first order gradient estimation, limiting the utility of higherorder gradient estimates.
To improve the sample efficiency of DiCE, we propose a new baseline term for higher order gradient estimation.
This term may be easily included in the objective, and produces unbiased variance-reduced estimators under (automatic) differentiation, without affecting the estimate of the objective itself or of the first order gradient estimate.
It reuses the same baseline function (e.g., the state-value function in reinforcement learning) already used for the first order baseline.
We provide theoretical analysis and numerical evaluations of this new baseline, which demonstrate that it can dramatically reduce the variance of DiCE’s second order gradient estimators and also show empirically that it reduces the variance of third and fourth order gradients.
This computational tool can be easily used to estimate higher order gradients with unprecedented efficiency and simplicity wherever automatic differentiation is utilised, and it has the potential to unlock applications of higher order gradients in reinforcement learning and meta-learning.
Monte Carlo Gradient Estimation in Machine Learning
Authors: Shakir Mohamed, Mihaela Rosca, Michael Figurnov, Andriy Mnih
Abstract: This paper is a broad and accessible survey of the methods we have at our disposal for Monte Carlo gradient estimation in machine learning and across the statistical sciences: the problem of computing the gradient of an expectation of a function with respect to parameters defining the distribution that is integrated; the problem of sensitivity analysis.
In machine learning research, this gradient problem lies at the core of many learning problems, in supervised, unsupervised and reinforcement learning.
We will generally seek to rewrite such gradients in a form that allows for Monte Carlo estimation, allowing them to be easily and efficiently used and analysed.
We explore three strategies—the pathwise, score function, and measure-valued gradient estimators—exploring their historical developments, derivation, and underlying assumptions.
We describe their use in other fields, show how they are related and can be combined, and expand on their possible generalisations.
Wherever Monte Carlo gradient estimators have been derived and deployed in the past, important advances have followed.
A deeper and more widely-held understanding of this problem will lead to further advances, and it is these advances that we wish to support.