an environment defines the tools your model can use and the reward signals for training. for search/rag, this means a search tool over your corpus and a reward function that checks if the model retrieved the right information.
how it works
the search tool (search)
- gives the model search over your corpus (lexical, vector, or hybrid depending on backend)
- takes a query string, optional mode, and result limit
- returns ranked text results
- the model learns to write better queries + iteratively search for missing information via rl
the reward function (compute_reward)
- runs after the model generates an answer
- measures text overlap between retrieved content and ground truth chunks
- optionally uses a judge model for more nuanced correctness scoring
- rewards ≥ 25% overlap, otherwise 0
- this teaches the model to search effectively, not just answer
during training, the model gets rewarded for using the search tool to find the right information before answering. over time it learns corpus-specific search patterns.
SearchEnv + SearchClient
SearchEnv is a single, unified environment class that works with any search backend via the SearchClient protocol. you pick your backend by passing a different SearchClient implementation. no need for separate environment classes per backend.
SearchClient protocol
a SearchClient is a lightweight, pickle-safe interface that any search backend implements:
from trainer.corpus.search_client import SearchClient
class SearchClient(Protocol):
def search(self, query: str, mode: str = "auto", top_k: int = 10) -> list[str]
def embed(self, text: str) -> list[float] | None
@property
def available_modes(self) -> list[str]
def get_params(self) -> dict[str, Any]
search(): returns plain text results (no pydantic objects in the pickle graph)embed(): returns an embedding vector if the backend supports it, elseNoneavailable_modes: reports what search modes the backend supports (e.g.["lexical"],["vector", "lexical", "hybrid"])get_params(): returns serializable connection parameters for pickle round-trips
the model can choose a mode via the mode parameter on the search tool. when set to "auto" (the default), SearchEnv picks the best available mode: hybrid > lexical > vector.
built-in implementations
| class | import | modes | notes |
|---|---|---|---|
CorporaSearch | cgft.corpus.corpora.search | lexical | castform hosted BM25, included with your account |
TpufSearch | cgft.corpus.turbopuffer.search | lexical, vector, hybrid | requires embed_fn for vector/hybrid |
PineconeSearch | cgft.corpus.pinecone.search | vector | uses custom embed_fn or Pinecone hosted inference |
ChromaSearch | cgft.corpus.chroma.search | vector, lexical, hybrid | auto-detects BM25 support from server |
for full setup guides, see the integration pages: turbopuffer, pinecone, chroma.
using the built-in environment
pass any SearchClient to SearchEnv and call train():
from trainer.corpus.corpora.search import CorporaSearch
from trainer.envs.search_env import SearchEnv
from trainer.trainer.pipeline import train
search = CorporaSearch(
api_key="sk_...",
corpus_name="my-docs",
base_url="https://app.castform.com",
)
experiment_id = train(
env_class=SearchEnv,
env_args={"search": search},
train_dataset=train_data,
eval_dataset=eval_data,
prefix="my-search-model",
api_key="sk_...",
)
swap CorporaSearch for any other SearchClient to use a different backend. the train() call stays the same.
judge-based rewards
by default, SearchEnv uses text overlap for rewards. for more nuanced scoring, configure a judge model:
experiment_id = train(
env_class=SearchEnv,
env_args={
"search": search,
"judge_base_url": "https://api.openai.com/v1",
"judge_api_key": "sk-...",
"judge_model": "gpt-4o",
"w_correctness": 1.0,
},
...
)
the judge evaluates the model’s answer against the ground truth and returns a correctness score. if the judge is unavailable, SearchEnv falls back to overlap-based rewards.
custom environments
if you want to define your own tools and reward logic, see writing your own environment for a full guide.
you can also implement the SearchClient protocol to plug in a custom search backend:
from trainer.corpus.search_client import SearchClient
from trainer.envs.search_env import SearchEnv
class MySearch:
def __init__(self, endpoint: str, api_key: str):
self._endpoint = endpoint
self._api_key = api_key
def search(self, query: str, mode: str = "auto", top_k: int = 10) -> list[str]:
# your search implementation - return plain text results
...
def embed(self, text: str) -> list[float] | None:
return None # optional
@property
def available_modes(self) -> list[str]:
return ["lexical"]
def get_params(self) -> dict[str, Any]:
return {"endpoint": self._endpoint, "api_key": self._api_key}
# use it with SearchEnv - no subclassing needed
train(
env_class=SearchEnv,
env_args={"search": MySearch("https://...", "key")},
...
)
the SearchClient protocol is designed for pickle safety: store only serializable connection parameters and reconstruct SDK clients lazily after unpickling.
next steps
- see qa generation to generate synthetic question/answer pairs over your corpus for training
- see launching training to launch a training job using your environment and dataset
- see corpus for setting up corpus backends