The goal of the project was to find a way to isolate the failure modes of controllers trained via reinforcement learning in an effort to increase transparency of machine learning models. Our focus lay on improving the robustness of an already trained model from NVIDIA, namely the in-hand manipulation controller DeXtreme.
We applied adversarial RL models to 'learn' the failure cases of the DeXtreme model, and once we knew what the failure cases were, we improved the controller from DeXtreme to be robust against them.Â
Adversaries added noises to the inputs and outputs of the controller network.
A residual network was attached at the end of the controller in order learn robustness against the adversarial noise.