retrieval-augmented generation (rag) is when an llm uses a search tool to find relevant information before answering a question. a rag agent can be finetuned to a specific corpus, learning to formulate better search queries and reason more effectively over your data. castform provides the full pipeline for this, from chunking documents through training.
rag step-by-step
the web ui walks you through the full pipeline. go to app.castform.com, click new training run, and select a RAG template.
1. corpus setup
upload your documents or emails directly, or connect an external corpus provider:
- turbopuffer: lexical, vector, and hybrid search
- pinecone: managed vector search with hosted inference
- chroma: self-hosted vector, lexical, and hybrid search
- notion: connect a Notion database as your corpus
the web ui indexes your documents and prepares them for QA generation.
2. task definition
configure the system prompt and completion tags. the web ui pre-populates a RAG-specific system prompt, which works well for most cases.
tips for a good system prompt:
- be specific about the task: “answer customer support questions using the retrieved context” is better than “answer questions”
- specify the output format you want: should it cite sources? use bullet points? keep answers short?
- mention what the model should do when results are irrelevant: “if the search results don’t contain the answer, say so”
3. dataset
the web ui generates synthetic question-answer pairs from your corpus.
dataset size guidance:
- minimum: 16 examples (the platform enforces this)
- recommended for validation: start with ~50 examples to verify your setup works before committing to a full run
- recommended for training: 200+ examples for good results, 1000+ for best results
- expect generation to take time: 1000+ examples can take a few hours to generate. validate your pipeline with a small batch first.
you can preview the generated pairs, adjust the train/eval split, and regenerate if needed.
4. tools
configure the search tool: name, available search modes (depends on your corpus provider), and filterable fields.
5. rewards
this is the most important step. rewards define what “good” looks like for your model. for RAG, the web ui includes four default reward components:
| component | what it measures | default weight |
|---|---|---|
| correctness (LLM judge) | is the answer correct and supported by the retrieved context? | 1 |
| conciseness (LLM judge) | is the response direct, without filler or repetition? gated on correctness >= 0.5 | 0.5 |
| citation | precision and recall of source citations (e.g. on thread_id) | 1 |
| tool call efficiency | did the model use a reasonable number of search calls? gated on correctness >= 0.5 | 1 |
we recommend starting with these defaults and tweaking them to match what you actually care about. for example:
- if citations don’t matter for your use case, remove the citation component or lower its weight
- if you want longer, more detailed answers, remove or lower the conciseness weight
- if your task requires precise tool usage, increase the tool call efficiency weight
customizing judge prompts: expand any LLM judge component to see its scoring criteria and score levels. you can edit the judge prompt to match your specific quality bar, adjust the score levels (e.g. add a 0.25 for “mostly correct but minor issues”), or change the gating conditions.
6. launch
review your config and launch. see launching for what happens next.
after launching
once your run starts, see the quickstart for what to expect: GPU warmup, early metric fluctuations, and what to watch for in completions.
to set up a rag training run with the python sdk:
1. chunk your data
split your documents into retrieval-sized pieces. built-in chunkers handle markdown and emails, or bring your own.
from trainer.chunkers.markdown import MarkdownChunker
chunker = MarkdownChunker(min_char=1024, max_char=2048)
chunks = chunker.chunk_folder("path/to/docs")see chunking for configuration options and the email chunker.
2. upload to a corpus backend
index your chunks for search. the simplest option is the castform corpus API:
from trainer.corpus.corpora.source import CorporaChunkSource
source = CorporaChunkSource(
api_key=API_KEY,
corpus_name="my-docs",
base_url=BASE_URL,
)
source.populate_from_chunks(chunks)for other backends, see corpus (turbopuffer, pinecone, chroma).
3. generate qa pairs
the pipeline generates synthetic question-answer pairs grounded in your corpus:
from trainer.qa_generation.cgft_models import CgftPipelineConfig, PlatformConfig, CorpusConfig, TargetsConfig
from trainer.qa_generation.cgft_pipeline import CgftPipeline
cfg = CgftPipelineConfig(
platform=PlatformConfig(api_key=API_KEY),
corpus=CorpusConfig(corpus_name="my-docs", corpus_id=source.corpus_id),
targets=TargetsConfig(total_samples=200),
)
cfg.resolve_api_keys()
pipeline = CgftPipeline(cfg)
result = pipeline.run()
train_data = result["train_dataset"]
eval_data = result["eval_dataset"]see qa generation for the full config reference.
4. define the environment and launch
import trainer
from trainer.corpus.corpora.search import CorporaSearch
from trainer.envs.search_env import SearchEnv
from trainer.trainer.pipeline import train
search = CorporaSearch(
api_key=API_KEY,
corpus_name="my-docs",
base_url=BASE_URL,
)
experiment_id = train(
env_class=SearchEnv,
env_args={"search": search},
train_dataset=train_data,
eval_dataset=eval_data,
prefix="my-rag-model",
api_key=API_KEY,
local_modules=[cgft],
)see search environment for reward configuration and custom search backends. see launching for train() parameters and dry run mode.
next steps
for more detail on each stage of the RAG pipeline:
- rag overview: how the pipeline stages fit together
- chunking: customize how documents are split
- corpus: backend options and setup
- qa generation: how
CgftPipelineworks, full config reference - search environment: how the RL training environment works