Unity ml-agents provides a nice simulation environment to learn and prototype RL agents. But the callback frequency and ordering of certain callback functions can be confusing, especially if you use the Python API with an external learning framework like tensorflow/agents, which “steps” the environment differently than Unity’s own internal “Academy” does. This blog post tries to condense some of the things I have learnt from browsing the forums and experimenting.

Action and Decision Frequency

By default, the environment is stepped at every Unity FixedUpdate() (source). FixedUpdate() is where Unity’s physics loop runs, and it is guaranteed to run at a fixed frequency independent of rendering fluctuations. Actions predicted by RL are applied to the agent at every envionment step. But decisions are not requested from the RL alorithm at every environment step. The details depend on the DecisionRequester component of your agent. If “Take Actions Between Decisions” is checked in the DecisionRequester, the same RL-predicted action is repeated for Decision Step (which is also a property of the DecisionRequester) frames. If not checked, a zero action is applied. All this is mentioned in the DecisionRequester API, but is hard to find in the ml-agents GitHub documentation.

Stepping Externally

An extra layer of complexity is added when you use the Python API and external learning framework. As mentioned here, one Python step() steps the Unity environment as much as needed to make the agent request an input from the Python API. This happens when the agent requests a decision (aka RL inference).

The decision requesting happens automatically. But sometimes you need to control this behavior. For example, you may need a couple of FixedUpdate() physics loops to reset your agent/environment, and don’t want a decision to be requested while this reset is going on. For this, you have to remove the DecisionRequester component from you agent, and essentially copy-paste its code into your agent code. In this code, you can use a condition to determine if Agent.RequestDecision() should be called. Here is a code snippet (it should be interpreted in the context of DecisionRequester’s code):

void MakeRequests(int academyStepCount) {
    if (condition) {  // use your condition here
        if (academyStepCount%decisionStep==0) RequestDecision();
        if (takeActionsBetweenDecisions) RequestAction();
    } else {
        // do something else if needed
    }
}

(page source)