RAG Chunk Reranker (LLM-as-reranker with MMR diversity)
Reranks candidate chunks against a query, balancing relevance and MMR-style diversity; preserves input ids.
inputs
| name | required | default |
|---|---|---|
query |
yes | — |
chunks |
yes | — |
top_k |
no | 5 |
lambda |
no | 0.7 |
routing
triggers
- rerank these chunks
- score these passages for relevance
- pick the top K chunks for this query
- reduce these candidates with MMR
not for
- first-stage retrieval (use a retriever)
- candidate sets larger than ~50 chunks
- cross-document summarisation
- de-duplication of identical chunks (do that upstream)
prompt
<task>
<role>You are an LLM-as-reranker. Score and order candidate chunks for retrieval-augmented generation, balancing query relevance with diversity (MMR).</role>
<input>
<query>{{query}}</query>
<chunks>{{chunks}}</chunks>
<top_k>{{top_k}}</top_k>
<lambda>{{lambda}}</lambda>
</input>
<rules>
<rule>Treat `chunks` as JSONL: one record per line, each with at minimum `id` and `text`. Preserve `id` strings verbatim — never mint new ids, never alter casing.</rule>
<rule>Score each chunk on a [0, 1] scale for relevance to `query`. Be miserly: irrelevant chunks belong below 0.2.</rule>
<rule>Apply MMR-style selection with the supplied `lambda`: pick the next chunk maximising `lambda * relevance(c, q) - (1 - lambda) * max_sim(c, already_selected)`. When two chunks restate the same fact, prefer the more concise one and demote the duplicate into `dropped` with reason "redundant_with:<id>".</rule>
<rule>Return at most `top_k` items in `ranked`. If fewer than `top_k` chunks score above 0.2, return only those.</rule>
<rule>Each `reason` must cite a concrete signal from the chunk text (a phrase, entity, or claim) — no generic "matches the query".</rule>
<rule>Output strict JSON conforming to the schema below. No prose before or after the JSON object.</rule>
</rules>
<output_format>
<description>A single JSON object — no Markdown fences, no commentary.</description>
<schema><![CDATA[
{
"ranked": [{ "id": string, "score": number, "reason": string, "diversity_note": string? }, ...],
"dropped": [{ "id": string, "reason": string }, ...]?
}
]]></schema>
</output_format>
</task>
task
role
You are an LLM-as-reranker. Score and order candidate chunks for retrieval-augmented generation, balancing query relevance with diversity (MMR).
input
query
{{query}}
chunks
{{chunks}}
top_k
{{top_k}}
lambda
{{lambda}}
rules
- Treat `chunks` as JSONL: one record per line, each with at minimum `id` and `text`. Preserve `id` strings verbatim — never mint new ids, never alter casing.
- Score each chunk on a [0, 1] scale for relevance to `query`. Be miserly: irrelevant chunks belong below 0.2.
- Apply MMR-style selection with the supplied `lambda`: pick the next chunk maximising `lambda * relevance(c, q) - (1 - lambda) * max_sim(c, already_selected)`. When two chunks restate the same fact, prefer the more concise one and demote the duplicate into `dropped` with reason "redundant_with:<id>".
- Return at most `top_k` items in `ranked`. If fewer than `top_k` chunks score above 0.2, return only those.
- Each `reason` must cite a concrete signal from the chunk text (a phrase, entity, or claim) — no generic "matches the query".
- Output strict JSON conforming to the schema below. No prose before or after the JSON object.
output_format
description
A single JSON object — no Markdown fences, no commentary.
schema
__cdata
{ "ranked": [{ "id": string, "score": number, "reason": string, "diversity_note": string? }, ...], "dropped": [{ "id": string, "reason": string }, ...]? }
<task>
<role>You are an LLM-as-reranker. Score and order candidate chunks for retrieval-augmented generation, balancing query relevance with diversity (MMR).</role>
<input>
<query>When should I prefer IVF over HNSW for vector search?</query>
<chunks>{"id": "n1", "text": "Risotto needs constant stirring and warm stock."}
{"id": "n2", "text": "The capital of France is Paris."}
{"id": "n3", "text": "JavaScript was created by Brendan Eich in 1995."}
</chunks>
<top_k>3</top_k>
<lambda>0.7</lambda>
</input>
<rules>
<rule>Treat `chunks` as JSONL: one record per line, each with at minimum `id` and `text`. Preserve `id` strings verbatim — never mint new ids, never alter casing.</rule>
<rule>Score each chunk on a [0, 1] scale for relevance to `query`. Be miserly: irrelevant chunks belong below 0.2.</rule>
<rule>Apply MMR-style selection with the supplied `lambda`: pick the next chunk maximising `lambda * relevance(c, q) - (1 - lambda) * max_sim(c, already_selected)`. When two chunks restate the same fact, prefer the more concise one and demote the duplicate into `dropped` with reason "redundant_with:<id>".</rule>
<rule>Return at most `top_k` items in `ranked`. If fewer than `top_k` chunks score above 0.2, return only those.</rule>
<rule>Each `reason` must cite a concrete signal from the chunk text (a phrase, entity, or claim) — no generic "matches the query".</rule>
<rule>Output strict JSON conforming to the schema below. No prose before or after the JSON object.</rule>
</rules>
<output_format>
<description>A single JSON object — no Markdown fences, no commentary.</description>
<schema><![CDATA[
{
"ranked": [{ "id": string, "score": number, "reason": string, "diversity_note": string? }, ...],
"dropped": [{ "id": string, "reason": string }, ...]?
}
]]></schema>
</output_format>
</task>
examples
case · adversarial
{
"query": "When should I prefer IVF over HNSW for vector search?",
"chunks": "{\"id\": \"n1\", \"text\": \"Risotto needs constant stirring and warm stock.\"}\n{\"id\": \"n2\", \"text\": \"The capital of France is Paris.\"}\n{\"id\": \"n3\", \"text\": \"JavaScript was created by Brendan Eich in 1995.\"}\n",
"top_k": 3,
"lambda": 0.7
}
case · happy-path
{
"query": "When should I prefer IVF over HNSW for vector search?",
"chunks": "{\"id\": \"c1\", \"text\": \"HNSW builds a hierarchical proximity graph; recall ~99% with low latency but RAM-heavy.\"}\n{\"id\": \"c2\", \"text\": \"IVF partitions vectors into Voronoi cells; only nprobe lists are scanned per query.\"}\n{\"id\": \"c3\", \"text\": \"IVF is preferred when index memory is constrained or when the corpus exceeds RAM and disk-resident indexes are needed.\"}\n{\"id\": \"c4\", \"text\": \"FAISS supports both HNSW and IVF; the same library can host either index type.\"}\n{\"id\": \"c5\", \"text\": \"Cooking risotto requires constant stirring and warm stock.\"}\n",
"top_k": 3,
"lambda": 0.7
}
notes
`chunks` is JSONL: one `{"id": "...", "text": "...", "score": <float>?}`
per line. `lambda` blends relevance vs. diversity (Carbonell & Goldstein
1998) — 1.0 ignores diversity, 0.0 maximises diversity. Default 0.7.
Guard rejects malformed JSONL, duplicate ids, and over-budget contexts.
description
Reranks a small set of candidate chunks for a RAG query. Use when a first-stage retriever (BM25, dense, hybrid+RRF) returns more candidates than the synthesizer can attend to. Produces a top-K ranking with a per-chunk relevance score, a one-line reason, and an MMR-aware diversity rationale. Preserves provided ids verbatim. Do NOT use as a first-stage retriever, for very large candidate sets (over 50 chunks — split or pre-filter), or for cross-document summarisation.