The agent is quite intelligent and his behavior is close to the optimal, he is doing well. The assistant learns the agent’s goal and behavior, and starts to act. The agent notices that the world became friendlier, but different. It is time for active exploration (which is expensive) by the agent. The exploration changes goal distribution completely; the assistant distorts the world by trying to achieve wrong goals. After awhile the assistant stops to act, the agent finishes exploration (suspecting he had a glitch); and they are back to the old world.
It seems that assistance works, if an assistant much smarter than an agent; and an agent accepting the world as it appears, without trying to learn it.