search environment | castform docs

an environment defines the tools your model can use and the reward signals for training. for search/rag, this means a search tool over your corpus and a reward function that checks if the model retrieved the right information.

how it works

the search tool (search)

gives the model search over your corpus (lexical, vector, or hybrid depending on backend)
takes a query string, optional mode, and result limit
returns ranked text results
the model learns to write better queries + iteratively search for missing information via rl

the reward function (compute_reward)

runs after the model generates an answer
measures text overlap between retrieved content and ground truth chunks
optionally uses a judge model for more nuanced correctness scoring
rewards ≥ 25% overlap, otherwise 0
this teaches the model to search effectively, not just answer

during training, the model gets rewarded for using the search tool to find the right information before answering. over time it learns corpus-specific search patterns.

SearchEnv + SearchClient

SearchEnv is a single, unified environment class that works with any search backend via the SearchClient protocol. you pick your backend by passing a different SearchClient implementation. no need for separate environment classes per backend.

SearchClient protocol

a SearchClient is a lightweight, pickle-safe interface that any search backend implements:

from trainer.corpus.search_client import SearchClient

class SearchClient(Protocol):
    def search(self, query: str, mode: str = "auto", top_k: int = 10) -> list[str]
    def embed(self, text: str) -> list[float] | None

    @property
    def available_modes(self) -> list[str]

    def get_params(self) -> dict[str, Any]

search(): returns plain text results (no pydantic objects in the pickle graph)
embed(): returns an embedding vector if the backend supports it, else None
available_modes: reports what search modes the backend supports (e.g. ["lexical"], ["vector", "lexical", "hybrid"])
get_params(): returns serializable connection parameters for pickle round-trips

the model can choose a mode via the mode parameter on the search tool. when set to "auto" (the default), SearchEnv picks the best available mode: hybrid > lexical > vector.

built-in implementations

class	import	modes	notes
`CorporaSearch`	`cgft.corpus.corpora.search`	lexical	castform hosted BM25, included with your account
`TpufSearch`	`cgft.corpus.turbopuffer.search`	lexical, vector, hybrid	requires `embed_fn` for vector/hybrid
`PineconeSearch`	`cgft.corpus.pinecone.search`	vector	uses custom `embed_fn` or Pinecone hosted inference
`ChromaSearch`	`cgft.corpus.chroma.search`	vector, lexical, hybrid	auto-detects BM25 support from server

for full setup guides, see the integration pages: turbopuffer, pinecone, chroma.

using the built-in environment

pass any SearchClient to SearchEnv and call train():

from trainer.corpus.corpora.search import CorporaSearch
from trainer.envs.search_env import SearchEnv
from trainer.trainer.pipeline import train

search = CorporaSearch(
    api_key="sk_...",
    corpus_name="my-docs",
    base_url="https://app.castform.com",
)

experiment_id = train(
    env_class=SearchEnv,
    env_args={"search": search},
    train_dataset=train_data,
    eval_dataset=eval_data,
    prefix="my-search-model",
    api_key="sk_...",
)

swap CorporaSearch for any other SearchClient to use a different backend. the train() call stays the same.

judge-based rewards

by default, SearchEnv uses text overlap for rewards. for more nuanced scoring, configure a judge model:

experiment_id = train(
    env_class=SearchEnv,
    env_args={
        "search": search,
        "judge_base_url": "https://api.openai.com/v1",
        "judge_api_key": "sk-...",
        "judge_model": "gpt-4o",
        "w_correctness": 1.0,
    },
    ...
)

the judge evaluates the model’s answer against the ground truth and returns a correctness score. if the judge is unavailable, SearchEnv falls back to overlap-based rewards.

custom environments

if you want to define your own tools and reward logic, see writing your own environment for a full guide.

you can also implement the SearchClient protocol to plug in a custom search backend:

from trainer.corpus.search_client import SearchClient
from trainer.envs.search_env import SearchEnv

class MySearch:
    def __init__(self, endpoint: str, api_key: str):
        self._endpoint = endpoint
        self._api_key = api_key

    def search(self, query: str, mode: str = "auto", top_k: int = 10) -> list[str]:
        # your search implementation - return plain text results
        ...

    def embed(self, text: str) -> list[float] | None:
        return None  # optional

    @property
    def available_modes(self) -> list[str]:
        return ["lexical"]

    def get_params(self) -> dict[str, Any]:
        return {"endpoint": self._endpoint, "api_key": self._api_key}

# use it with SearchEnv - no subclassing needed
train(
    env_class=SearchEnv,
    env_args={"search": MySearch("https://...", "key")},
    ...
)

the SearchClient protocol is designed for pickle safety: store only serializable connection parameters and reconstruct SDK clients lazily after unpickling.

next steps

see qa generation to generate synthetic question/answer pairs over your corpus for training
see launching training to launch a training job using your environment and dataset
see corpus for setting up corpus backends