video game
rl
world
the space the player moves through
→
environment
the system the model interacts with
player
decides what to do each frame
→
agent
the llm being trained
moves
jump, shoot, move left
→
actions
tokens; tool calls like edit_file()
score
points the game assigns
→
reward
a number the environment returns
one playthrough
start to game over
→
episode / rollout
start to final reward