environment overview | castform docs

an environment defines how your model interacts with a task during rl training.

you configure three things:

tools the model can call
rewards used to score behavior
dataset preprocessing and splitting

core interface

every environment extends BaseEnv and implements these methods:

from benchmax.envs.base_env import BaseEnv, ToolDefinition, StandardizedExample

class MyEnv(BaseEnv):
    async def list_tools(self) -> list[ToolDefinition]:
        ...

    async def run_tool(self, rollout_id: str, tool_name: str, **tool_args):
        ...

    async def compute_reward(self, rollout_id: str, completion: list[dict[str, Any]], ground_truth, **kwargs) -> dict[str, float]:
        ...

    @classmethod
    def dataset_preprocess(cls, example, **kwargs) -> StandardizedExample:
        ...

go deeper

tools give the model things it can do. rewards specify what good looks like. dataset build your training data. testing validate your environment before you launch a training job.