Instance-dependent reinforcement learning
In recent years, there has been tremendous progress in the field of reinforcement learning (RL), especially on the empirical side. But it is fair to say that there is a considerable gap between theory and practice: many RL methods behave far better than existing worst-case theory would suggest, and often they work in settings where the current worst-case guarantees are completely prohibitive. In this talk, we will discuss why worst-case guarantees can severely overestimate the difficulty of reinforcement learning problems in presence of favorable structure. This motivates us to consider an instance-dependent difficulty measure that is responsive to the problem structure. Next, we discuss how we can construct estimators that adapt to this instance-dependent difficulty. We show that for problems with favorable structures our proposed estimators and associated confidence regions are significantly better than those obtained from the worst-case theory. Finally, we show that the techniques that we developed for constructing instance-dependent estimators are not specific to RL problems, and they can be applied to a broad class of other problems.
This talk is based on joint works with Ashwin Pananjady, Eric Xia, Wenlong Mou, Feng Ruan, Martin J. Wainwright, Michael I. Jordan and Peter L. Bartlett.