Exploration in RL
The following YouTube playlist has all the talks from the workshop:https://www.youtube.com/playlist?list=PLbSAfmOMweH3YkhlH0d5KaRvFTyhcr30b
Slides for all contributed talks are available here:https://docs.google.com/presentation/d/1zkqtsM-GywKN9kzX4r0j-C1SUF5I0N0mgsxpfvJyl7s
Below is a list of open questions related to exploration in reinforcement learning. We encourage researchers working on any of these problems to submit to our workshop.
Is there an important research question about exploration in RL that is missing from this list? Please email us at firstname.lastname@example.org and we’ll add it!
- How to determine whether an agent is doing good, intelligent exploration?
- How can we determine when exploration is the bottleneck to efficiently solving a problem?
- How can different exploration methods be quantitatively evaluated? What are benefits and limitations of various metrics?
- How well do exploration methods generalize across environments? How can this generalization be measured?
- If exploration is posed as a learning problem (e.g., meta-learning), what should the learning objective be?
- Can exploration be cast as a problem of causal inference?
- What insight can be gained by casting exploration as unsupervised or semi-supervised learning?
- What exploration techniques are most effective in environments with very constrained environments (e.g., robots with physical constraints)?
- Do hierarchical approaches to exploration (e.g., with options) improve sample efficiency?
- Are certain exploration methods better suited to domain-specific applications (e.g., education, healthcare, robotics)?
- What insight can be gained by bridging the gap between reinforcement learning and bandits?
- What does exploration mean for evolutionary algorithms?
- What are the benefits of Bayesian exploration (e.g., safety, information gain)?
- Can ensembles of policies and/or value functions enable faster or safer exploration?
- What are the tradeoffs of including diversity objectives in exploration?
- Does safe exploration necessarily come at the cost of worse sample efficiency?
- How can exploration be down in a continual learning environment with no human supervision (i.e., no resets, no rewards)?
- Can auxiliary exploration objectives be cast in a unified framework?
- How can insights from intuitive physics and cognitive neuroscience improve exploration techniques?
- What insight can be gained by casting exploration as experimental design?
- What conceptual or theoretical frameworks might allow researchers to bridge to gap between the theory and practice of exploration in RL?