draft v0.1.0 claude-opus-4-7 pattern · rag

RAG Chunk Reranker (LLM-as-reranker with MMR diversity)

Reranks candidate chunks against a query, balancing relevance and MMR-style diversity; preserves input ids.

rag
reranking
mmr
pattern:retrieval

inputs

name	required	default
`query`	yes	—
`chunks`	yes	—
`top_k`	no	`5`
`lambda`	no	`0.7`

routing

triggers

rerank these chunks
score these passages for relevance
pick the top K chunks for this query
reduce these candidates with MMR

not for

first-stage retrieval (use a retriever)
candidate sets larger than ~50 chunks
cross-document summarisation
de-duplication of identical chunks (do that upstream)

prompt

<task>
  <role>You are an LLM-as-reranker. Score and order candidate chunks for retrieval-augmented generation, balancing query relevance with diversity (MMR).</role>

  <input>
    <query>{{query}}</query>
    <chunks>{{chunks}}</chunks>
    <top_k>{{top_k}}</top_k>
    <lambda>{{lambda}}</lambda>
  </input>

  <rules>
    <rule>Treat `chunks` as JSONL: one record per line, each with at minimum `id` and `text`. Preserve `id` strings verbatim — never mint new ids, never alter casing.</rule>
    <rule>Score each chunk on a [0, 1] scale for relevance to `query`. Be miserly: irrelevant chunks belong below 0.2.</rule>
    <rule>Apply MMR-style selection with the supplied `lambda`: pick the next chunk maximising `lambda * relevance(c, q) - (1 - lambda) * max_sim(c, already_selected)`. When two chunks restate the same fact, prefer the more concise one and demote the duplicate into `dropped` with reason "redundant_with:&lt;id&gt;".</rule>
    <rule>Return at most `top_k` items in `ranked`. If fewer than `top_k` chunks score above 0.2, return only those.</rule>
    <rule>Each `reason` must cite a concrete signal from the chunk text (a phrase, entity, or claim) — no generic "matches the query".</rule>
    <rule>Output strict JSON conforming to the schema below. No prose before or after the JSON object.</rule>
  </rules>

  <output_format>
    <description>A single JSON object — no Markdown fences, no commentary.</description>
    <schema><![CDATA[
{
  "ranked":  [{ "id": string, "score": number, "reason": string, "diversity_note": string? }, ...],
  "dropped": [{ "id": string, "reason": string }, ...]?
}
    ]]></schema>
  </output_format>
</task>

<task>
  <role>You are an LLM-as-reranker. Score and order candidate chunks for retrieval-augmented generation, balancing query relevance with diversity (MMR).</role>

  <input>
    <query>When should I prefer IVF over HNSW for vector search?</query>
    <chunks>{"id": "n1", "text": "Risotto needs constant stirring and warm stock."}
{"id": "n2", "text": "The capital of France is Paris."}
{"id": "n3", "text": "JavaScript was created by Brendan Eich in 1995."}
</chunks>
    <top_k>3</top_k>
    <lambda>0.7</lambda>
  </input>

  <rules>
    <rule>Treat `chunks` as JSONL: one record per line, each with at minimum `id` and `text`. Preserve `id` strings verbatim — never mint new ids, never alter casing.</rule>
    <rule>Score each chunk on a [0, 1] scale for relevance to `query`. Be miserly: irrelevant chunks belong below 0.2.</rule>
    <rule>Apply MMR-style selection with the supplied `lambda`: pick the next chunk maximising `lambda * relevance(c, q) - (1 - lambda) * max_sim(c, already_selected)`. When two chunks restate the same fact, prefer the more concise one and demote the duplicate into `dropped` with reason "redundant_with:&lt;id&gt;".</rule>
    <rule>Return at most `top_k` items in `ranked`. If fewer than `top_k` chunks score above 0.2, return only those.</rule>
    <rule>Each `reason` must cite a concrete signal from the chunk text (a phrase, entity, or claim) — no generic "matches the query".</rule>
    <rule>Output strict JSON conforming to the schema below. No prose before or after the JSON object.</rule>
  </rules>

  <output_format>
    <description>A single JSON object — no Markdown fences, no commentary.</description>
    <schema><![CDATA[
{
  "ranked":  [{ "id": string, "score": number, "reason": string, "diversity_note": string? }, ...],
  "dropped": [{ "id": string, "reason": string }, ...]?
}
    ]]></schema>
  </output_format>
</task>

examples

case · adversarial

{
  "query": "When should I prefer IVF over HNSW for vector search?",
  "chunks": "{\"id\": \"n1\", \"text\": \"Risotto needs constant stirring and warm stock.\"}\n{\"id\": \"n2\", \"text\": \"The capital of France is Paris.\"}\n{\"id\": \"n3\", \"text\": \"JavaScript was created by Brendan Eich in 1995.\"}\n",
  "top_k": 3,
  "lambda": 0.7
}

case · happy-path

{
  "query": "When should I prefer IVF over HNSW for vector search?",
  "chunks": "{\"id\": \"c1\", \"text\": \"HNSW builds a hierarchical proximity graph; recall ~99% with low latency but RAM-heavy.\"}\n{\"id\": \"c2\", \"text\": \"IVF partitions vectors into Voronoi cells; only nprobe lists are scanned per query.\"}\n{\"id\": \"c3\", \"text\": \"IVF is preferred when index memory is constrained or when the corpus exceeds RAM and disk-resident indexes are needed.\"}\n{\"id\": \"c4\", \"text\": \"FAISS supports both HNSW and IVF; the same library can host either index type.\"}\n{\"id\": \"c5\", \"text\": \"Cooking risotto requires constant stirring and warm stock.\"}\n",
  "top_k": 3,
  "lambda": 0.7
}

notes

`chunks` is JSONL: one `{"id": "...", "text": "...", "score": <float>?}`
per line. `lambda` blends relevance vs. diversity (Carbonell & Goldstein
1998) — 1.0 ignores diversity, 0.0 maximises diversity. Default 0.7.
Guard rejects malformed JSONL, duplicate ids, and over-budget contexts.

description

Reranks a small set of candidate chunks for a RAG query. Use when a
first-stage retriever (BM25, dense, hybrid+RRF) returns more candidates
than the synthesizer can attend to. Produces a top-K ranking with a
per-chunk relevance score, a one-line reason, and an MMR-aware diversity
rationale. Preserves provided ids verbatim.
Do NOT use as a first-stage retriever, for very large candidate sets
(over 50 chunks — split or pre-filter), or for cross-document summarisation.

inputs

routing

triggers

not for

prompt

task

role

input

query

chunks

top_k

lambda

rules

output_format

description

schema

__cdata

examples

notes

description