To stay sane I needed to write down some of the actionable ideas that occur to me. Otherwise I have the tendency to hoard them. So, these are the questions I am not going to answer (argh it hurts!). They appear to be perfectly good research directions, but “you need to focus” (says pretty much everyone I meet).

Requests for research

(the number of stars reflects how open the problem is:, 1 star means little room for interpretation, 3 stars mean that there are some complex choices to be made)

Controlled implicit qualtiles Extend Implicit quantile RL (which works suprisingly well) to use control variates.

Atari-onomy Make a taskonomy of the Atari games, showing how ‘similar’ each game is to others.

☆ ☆ ☆ Learner discrimination Just by observing a player learn, can we identify the learning algorithm is it using to learn? For example, can we distinguish the learners in OpenAI’s baselines, PPO, AC2, AKTR, …?

☆ ☆ Meta learning from temporal decompositions Meta-RL trains a learner on the aggregated return over many episodes (a larger time scale). If we construct a temporal decomposition (moving averages at different time-scales) of rewards and aproximate them with a set of value functions, does this produce a heirarchy of meta learners? (will need a way to aggregate actions chosen in different time scales, for example $\pi(s_t) = g(\sum_k f_k(z_t))$)