Revising Reinforcement Learning: Adaptable Environment Interactions

A major challenge in reinforcement learning is helping tasks transfer smoothly between different environments. A popular approach to handle this is to create policies that work the same way regardless of the environment. However, our take is different. We suggest that policies should learn to spot and use the unique features of each environment to perform better. Introducing the 'Environment-Probing Interaction' (EPI) policy, this policy first explores a new environment to understand its unique traits.

Then, it uses this knowledge to guide a task-specific policy. The learning process involves a reward system that enhances the ability to predict what comes next in the environment. Our experiments demonstrate that EPI-policies perform remarkably better on new environments compared to other common techniques.

Actions