an environment defines how your model interacts with a task during rl training.
you configure three things:
- tools the model can call
- rewards used to score behavior
- dataset preprocessing and splitting
core interface
every environment extends BaseEnv and implements these methods:
from benchmax.envs.base_env import BaseEnv, ToolDefinition, StandardizedExample
class MyEnv(BaseEnv):
async def list_tools(self) -> list[ToolDefinition]:
...
async def run_tool(self, rollout_id: str, tool_name: str, **tool_args):
...
async def compute_reward(self, rollout_id: str, completion: list[dict[str, Any]], ground_truth, **kwargs) -> dict[str, float]:
...
@classmethod
def dataset_preprocess(cls, example, **kwargs) -> StandardizedExample:
...