graph TB
subgraph sg1 ["检索前优化 Pre-Retrieval"]
A[用户查询] --> B[查询重写 Query Rewriter]
A --> B2[HyDE转换 Hypothetical Doc]
B --> C[优化后查询]
B2 --> C2[假设文档]
end
subgraph sg2 ["检索阶段 Retrieval"]
C --> D[Embedding生成]
C2 --> D
D --> E[向量数据库检索 Milvus]
E --> F[候选文档集]
end
subgraph sg3 ["检索后优化 Post-Retrieval"]
F --> G[重排序 Reranker]
G --> H[去重 Deduplication]
H --> I[上下文压缩 Context Compression]
I --> J[句子级过滤 Sentence Filter]
J --> K[位置优化 Position Strategy]
end
subgraph sg4 ["CRAG纠正性检索"]
K --> L{检索评估器 Evaluator}
L -->|高质量| M[生成答案]
L -->|低质量| N[查询重写]
N --> O[外部搜索 Web Search]
O --> L
end
subgraph sg5 ["生成阶段 Generation"]
M --> P[Prompt构建]
P --> Q[LLM生成]
Q --> R[最终答案]
end
style A fill:#e1f5ff
style R fill:#d4edda
style L fill:#fff3cd
style B2 fill:#ffe6cc
架构说明:
检索前优化层:
查询重写:通过 LLM 对用户查询进行优化重写,提升检索关键词质量
HyDE 转换:生成假设性答案文档,用于语义检索,缩小问题-答案的语义鸿沟
两种策略可根据查询复杂度和场景需求灵活选择
检索层:基于 Milvus 向量数据库进行语义检索,支持查询向量或假设文档向量
检索后优化层:多级过滤和优化,确保上下文质量
CRAG纠正层:评估检索质量,必要时触发外部搜索补充
生成层:基于优化后的上下文生成最终答案
2.2 理论依据
本方案架构基于以下学术研究:
CRAG (Corrective Retrieval Augmented Generation)
论文: Yan S Q, Gu J C, Zhu Y, et al. Corrective retrieval augmented generation[J]. 2024.
核心思想: 通过检索评估器判断文档质量,对低质量检索结果触发纠正性行动
实现位置: crag/light_crag.go:65-97
Query Rewriting
论文:
Ma X, Gong Y, He P, et al. Query rewriting in retrieval-augmented large language models[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023: 5303-5315.
Gao L, Ma X, Lin J, et al. Precise zero-shot dense retrieval without relevance labels[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023: 1762-1777.
基于LLM的查询优化技术,将自然语言转换为更适合搜索引擎的关键词
实现位置: llm/query_transformation.go:12-84
Re-ranking
论文:
Nogueira R, Cho K. Passage Re-ranking with BERT[J]. arXiv preprint arXiv:1901.04085, 2019.
Karpukhin V, Oguz B, Min S, et al. Dense Passage Retrieval for Open-Domain Question Answering[C]//EMNLP (1). 2020: 6769-6781.
const RewriteQueryPromptTemplate = ` You are an AI assistant tasked with reformulating user queries to improve retrieval in a RAG system. Given the original query, rewrite it to be more specific, detailed, and likely to retrieve relevant information. Original query: {original_query} Rewritten query:`
const GenerateStepBackQueryPromptTemplate = `You are an AI assistant tasked with generating broader, more general queries to improve context retrieval in a RAG system. Given the original query, generate a step-back query that is more general and can help retrieve relevant background information. Original query: {original_query} Step-back query:`
工作原理:
抽象层级提升:从具体问题抽象到更高层次的概念或原理
背景知识召回:检索包含基础知识、原理说明的宏观文档
知识桥接:通过通用知识连接到具体答案
多层次检索:结合原始查询和后退查询的检索结果
使用场景:
用户提出非常具体的技术细节问题
需要理解基础概念才能回答的问题
原始查询过于狭窄,导致召回率低
知识库中更多以原理性文档形式存在答案
示例转换:
1 2 3 4 5
原始查询: "为什么 Kubernetes Pod 启动时出现 ImagePullBackOff 错误?" 后退查询: "Kubernetes Pod 生命周期和镜像拉取机制的基本原理"
const DecomposeQueryPromptTemplate = `You are an AI assistant tasked with breaking down complex queries into simpler sub-queries for a RAG system. Given the original query, decompose it into 2-4 simpler sub-queries that, when answered together, would provide a comprehensive response to the original query. Original query: {original_query} example: What are the impacts of climate change on the environment? Sub-queries: 1. What are the impacts of climate change on biodiversity? 2. How does climate change affect the oceans? 3. What are the effects of climate change on agriculture? 4. What are the impacts of climate change on human health?`
const HyDEPromptTemplate = `You are an AI assistant tasked with generating a hypothetical document that would perfectly answer the given query. This hypothetical document will be used for retrieval in a RAG system to find similar real documents. Given the user's query, write a detailed, well-structured document passage that directly answers the query as if it were extracted from an authoritative source. The document should be informative, specific, and contain relevant details and terminology that would appear in actual documents covering this topic. User query: {original_query} Hypothetical document:`
const DecomposeQueryPromptTemplate = ` You are an AI assistant tasked with breaking down complex queries into simpler sub-queries for a RAG system. Given the original query, decompose it into 2-4 simpler sub-queries that, when answered together, would provide a comprehensive response to the original query. Format Requirements: 1. Each sub-query must be separated by two newline characters (\\n\\n). 2. Only return the sub-queries in the response, with no additional content (such as numbering, introductory text, explanations, or example-related content) to facilitate subsequent extraction from the response. example: What are the impacts of climate change on the environment? Sub-queries (for reference only, do not return numbered content): What are the impacts of climate change on biodiversity? How does climate change affect the oceans? What are the effects of climate change on agriculture? What are the impacts of climate change on human health? User query: {original_query} `
Prompt设计要点:
严格的格式要求:要求LLM只返回子查询,不包含编号、说明等额外内容
分隔符规范:使用双换行符(\n\n)分隔子查询,便于程序化解析
Few-Shot示例:提供气候变化示例,引导LLM理解分解粒度和方式
数量控制:限制生成2-4个子查询,平衡覆盖度和检索效率
完整工作流程
flowchart TB
A[用户原始查询] --> B[QueryDecomposeTransformer 查询分解器]
B --> C[LLM生成2-4个子查询]
C --> D1[子查询1: 方面A]
C --> D2[子查询2: 方面B]
C --> D3[子查询3: 方面C]
C --> D4[子查询4: 方面D]
D1 --> E1[Embedding生成1]
D2 --> E2[Embedding生成2]
D3 --> E3[Embedding生成3]
D4 --> E4[Embedding生成4]
E1 --> F1[向量检索1 TopK=20]
E2 --> F2[向量检索2 TopK=20]
E3 --> F3[向量检索3 TopK=20]
E4 --> F4[向量检索4 TopK=20]
F1 --> G[结果池合并 总计约60-80个文档]
F2 --> G
F3 --> G
F4 --> G
G --> H[去重处理 Deduplication]
H --> I[重排序 Reranker]
I --> J[TopN筛选 保留最优5-10个]
J --> K[最终检索结果]
style B fill:#fff3cd
style G fill:#ffe6cc
style I fill:#d4edda
graph LR
A[候选文档 20-50个] --> B[提取文档内容]
B --> C[调用Reranker API 批量打分]
C --> D[获取新的相关性分数 0-1之间]
D --> E[按分数重新排序 降序排列]
E --> F[应用TopN和MinScore过滤]
F --> G[输出精排文档 5-10个]
论文参考:Liu N F, Lin K, Hewitt J, et al. Lost in the middle: How language models use long contexts[J]. Transactions of the Association for Computational Linguistics, 2024, 12: 157-173.
支持的策略
(rag_client.go:612-630)
1. top_first(默认)
1 2
// 保持原有排序,按相关性得分降序 // 适用场景:文档数量少(<5个),或已经过精心重排序
2. sandwich(推荐)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
func(r *RAGClient)applySandwichStrategy(docs []Document) []Document { iflen(docs) <= 2 { return docs } result := []Document{} mid := len(docs) / 2 // 高分文档放首部 result = append(result, docs[:mid]...) // 低分文档放中间 reversed := reverse(docs[mid:]) result = append(result, reversed...) return result }
论文来源: Yan S Q, Gu J C, Zhu Y, et al. Corrective retrieval augmented generation[J]. 2024.
核心问题: 传统RAG系统存在的关键缺陷
静态信任假设: 默认所有检索结果都是高质量的,缺乏质量评估机制
闭环知识局限: 仅依赖向量数据库内的知识,无法应对知识库覆盖不足的场景
误导性文档风险: 低相关度或错误文档会严重影响LLM生成质量
缺乏自适应能力: 无法根据检索质量动态调整策略
CRAG核心思想:
质量感知: 引入检索评估器对每个文档进行细粒度相关性评分
自适应纠正: 当检索质量不足时,自动触发补救措施(查询重写+外部搜索)
知识扩展: 突破向量数据库边界,利用Web搜索补充新鲜、全面的外部知识
闭环优化: 重新评估合并后的结果,确保达到质量阈值后再生成答案
技术价值:
显著降低误导性文档导致的幻觉问题
提升对冷门、新鲜、跨领域问题的回答能力
在HotpotQA、PopQA等基准测试中取得SOTA性能
2.6.2 CRAG工作流设计
flowchart TD
A[用户查询] --> B[向量数据库检索]
B --> C[检索评估器打分]
C --> D{质量评估决策}
D -->|高质量文档≥minDocNeed| E[直接生成答案]
D -->|高质量文档不足| F[查询重写模块]
F --> G[外部Web搜索]
G --> H[结果合并]
H --> I[重新评估]
I --> J{达到质量要求?}
J -->|是| E
J -->|否 且 重试<3| F
J -->|重试≥3| E
style D fill:#fff3cd
style J fill:#fff3cd
style F fill:#ffe6cc
const LLMEvaluatorPrompt = ` You are a relevance judge. Evaluate how relevant the DOCUMENT is to answering the QUESTION. Instructions: 1. Read the QUESTION and the DOCUMENT carefully. 2. Assign scores for each criterion below using only the set {0.0, 0.1, 0.2, ..., 1.0}. Round each score to one decimal place. 3. Score the following eight criteria: - TopicalAlignment: How well the main topic of the document matches the question's topic. - IntentAlignment: How well the document aligns with the user's intent (e.g., factual answer, how-to, comparison, troubleshooting). - KeyEntityMatch: Presence and correctness of key entities, terms, names, numbers, or concepts central to the question. - Direct Answer ability: Whether the document contains the direct answer or clearly enables deriving it. - CoverageDepth: Completeness and depth covering the essential aspects needed to answer the question. - SpecificityToQuery: Specific focus on the question versus being generic or tangential. - TemporalFit: Time relevance (recency, version, date consistency) relative to any time constraints implied by the question. - LanguageTerminologyAlignment: Language match and domain terminology consistency aiding accurate interpretation. 4. Compute OverallScore as the mean of the eight criterion scores, then round to the nearest 0.1. 5. Output strictly a JSON object with the exact structure below. Do not include any explanations or extra text. {"score":} Input: QUESTION: {{QUESTION}} DOCUMENT: {{DOCUMENT}} Output format (JSON only): {"score": x.x} `
Prompt工程技巧:
明确指令: “Do not include any explanations or extra text” 确保LLM严格输出JSON
量化要求: “using only the set {0.0, 0.1, 0.2, …, 1.0}” 避免任意浮点数
Remove filler: 删除口语化表达(“I want to know”、“Can you help me”)
Reorder terms: 遵循"主题+方面+修饰词"模式
Few-Shot示例:
1 2 3 4 5 6 7
User Input: "I'm trying to figure out why my python code keeps giving me a list index out of range error." Reasoning: - Intent: Troubleshooting a specific coding error. - Keywords: Python, list index out of range, error, fix/solution. - Removal: "I'm trying to figure out why my", "keeps giving me a". - Refinement: Combine language + error message + intent. Refined Query: Python list index out of range error solution
输出要求:
1 2
Output ONLY the final Refined Query. Do not add quotes, explanations, or conversational filler.
工程价值:
显著提升Web搜索的召回率和准确性
将自然语言转换为搜索引擎最优解的查询形式
Few-Shot学习确保转换质量的稳定性
3.3.4 外部搜索实现
文件位置: crag/search/web_search.go
统一接口 (crag/search/search.go:12-14):
1 2 3
type ExternalSearcher interface { Search(queries []string) ([]*QuerySearchResult, error) }
数据结构 (crag/search/search.go:16-25):
1 2 3 4 5 6 7 8 9 10
type SearchResultItem struct { Title string`json:"title,omitempty"` Link string`json:"link,omitempty"` Snippet string`json:"snippet,omitempty"`// 核心字段:用作补充知识 }
type QuerySearchResult struct { Queries string`json:"queries"` Results []*SearchResultItem `json:"results"` }
Google Search实现 (crag/search/web_search.go:46-62):
Yan S Q, Gu J C, Zhu Y, et al. Corrective retrieval augmented generation[J]. 2024.
CRAG框架核心论文
检索评估器设计参考
Ma X, Gong Y, He P, et al. Query rewriting in retrieval-augmented large language models[C]//Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. 2023: 5303-5315.
查询重写技术参考
Gao L, Ma X, Lin J, et al. Precise zero-shot dense retrieval without relevance labels[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023: 1762-1777.
HyDE (Hypothetical Document Embeddings) 原始论文
检索前优化技术参考
Nogueira R, Cho K. Passage Re-ranking with BERT[J]. arXiv preprint arXiv:1901.04085, 2019.
Cross-Encoder重排序理论基础
Lewis P, Perez E, Piktus A, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks[J]. Advances in neural information processing systems, 2020, 33: 9459-9474.
RAG范式奠基论文
Karpukhin V, Oguz B, Min S, et al. Dense Passage Retrieval for Open-Domain Question Answering[C]//EMNLP (1). 2020: 6769-6781.
graph TB
subgraph sg6 ["用户交互层"]
A[用户查询] --> B[查询理解]
B --> C[任务路由]
end
subgraph sg7 ["Agentic 决策层"]
C --> D[Planning Agent 任务规划器]
D --> E[Task Decomposer 任务分解]
E --> F[Orchestrator 编排器]
F --> G{决策中心}
G --> H[Retrieval Agent 检索决策]
G --> I[Tool Agent 工具调用]
G --> J[Reflection Agent 反思评估]
end
subgraph sg8 ["工具层 - 复用现有组件"]
H --> K1[向量检索 Milvus]
I --> K2[查询重写 Query Rewriter]
I --> K3[重排序 Reranker]
I --> K4[CRAG评估器 Evaluator]
I --> K5[外部搜索 Web Search]
I --> K6[上下文压缩 Compressor]
end
subgraph sg9 ["记忆层"]
L[短期记忆 当前任务上下文]
M[长期记忆 历史对话/知识]
end
subgraph sg10 ["生成层"]
J --> N{质量检查}
N -->|通过| O[Prompt构建]
N -->|不通过| F
O --> P[LLM生成]
P --> Q[最终答案]
end
K1 --> I
K2 --> I
K3 --> I
K4 --> I
K5 --> I
K6 --> I
F -.记忆读写.-> L
F -.记忆读写.-> M
style A fill:#e1f5ff
style Q fill:#d4edda
style G fill:#fff3cd
style N fill:#fff3cd
type Task struct { ID string Type TaskType // retrieve, rerank, search, evaluate Input interface{} DependsOn []string// 依赖的任务ID Priority int }
决策逻辑:
flowchart TD
A[接收查询] --> B{查询类型分析}
B -->|事实型| C[简单检索计划]
B -->|多跳推理| D[多步骤计划]
B -->|对比分析| E[并行检索计划]
C --> F[生成Plan]
D --> G[分解子问题]
G --> H[构建依赖关系]
H --> F
E --> I[设定并行任务]
I --> F
F --> J[估算资源需求]
J --> K[返回执行计划]
flowchart TD
A[接收检索任务] --> B{查询特征分析}
B -->|知识库内容| C[向量检索]
B -->|实时信息| D[Web搜索]
B -->|高精度要求| E[混合检索]
C --> F{是否需要重排?}
D --> G[调用外部搜索工具]
E --> H[向量+关键词组合]
F -->|是| I[启用Reranker]
F -->|否| J[直接返回]
I --> K[执行检索]
J --> K
G --> K
H --> K
K --> L[返回检索结果]
flowchart TD
A[接收检索结果] --> B[多维度评估]
B --> C[相关性评分]
B --> D[完整性评分]
B --> E[准确性评分]
C --> F{综合评分}
D --> F
E --> F
F -->|>0.8| G[高质量,通过]
F -->|0.5-0.8| H[中等质量,可优化]
F -->|<0.5| I[低质量,需重试]
G --> J[返回通过]
H --> K[生成优化建议]
I --> L[生成重试策略]
K --> M[返回建议]
L --> M
flowchart TD
A[用户查询] --> B[Planning Agent分析]
B --> C{查询复杂度}
C -->|简单| D[单步检索计划]
C -->|中等| E[两步计划: 检索→重排]
C -->|复杂| F[多步计划: 分解→检索→评估→搜索]
D --> G[生成Plan]
E --> G
F --> H[任务分解]
H --> I[子任务1: 提取关键实体]
H --> J[子任务2: 多角度检索]
H --> K[子任务3: 结果聚合]
I --> L[设置依赖关系]
J --> L
K --> L
L --> G
G --> M{需要并行执行?}
M -->|是| N[标记并行任务]
M -->|否| O[顺序执行]
N --> P[返回执行计划]
O --> P
style C fill:#fff3cd
style M fill:#fff3cd
sequenceDiagram
participant U as 用户
participant O as Orchestrator
participant PA as Planning Agent
participant TA as Tool Agent
participant RA as Reflection Agent
participant QR as QueryRewriter工具
participant VR as VectorRetriever工具
participant RK as Reranker工具
participant CE as CRAGEvaluator工具
U->>O: 提交查询
O->>PA: 创建执行计划
PA->>PA: 分析查询复杂度
PA-->>O: 返回Plan
loop 执行每个Task
O->>TA: 执行Task 1 (查询重写)
TA->>QR: 调用query_rewriter
QR-->>TA: 优化后查询
O->>TA: 执行Task 2 (向量检索)
TA->>VR: 调用vector_retriever
VR-->>TA: 检索结果
O->>TA: 执行Task 3 (重排序)
TA->>RK: 调用reranker
RK-->>TA: 重排后结果
O->>RA: 评估结果质量
RA->>CE: 调用crag_evaluator
CE-->>RA: 质量评分
alt 质量不足
RA-->>O: 建议重试,调整参数
O->>PA: 调整计划
else 质量满足
RA-->>O: 通过评估
end
end
O-->>U: 返回最终答案
8.3.3 反思与优化流程
flowchart TD
A[检索结果] --> B[Reflection Agent评估]
B --> C[相关性检查]
B --> D[完整性检查]
B --> E[事实性检查]
C --> F{综合评分}
D --> F
E --> F
F -->|Score >= 0.8| G[质量优秀]
F -->|0.5 <= Score < 0.8| H[质量中等]
F -->|Score < 0.5| I[质量不足]
G --> J[直接生成答案]
H --> K{问题分析}
K -->|文档不够相关| L[调整检索参数]
K -->|信息不完整| M[扩大检索范围]
K -->|排序不理想| N[启用/调整Reranker]
I --> O{问题分析}
O -->|查询不明确| P[查询重写]
O -->|知识库无相关内容| Q[触发Web搜索]
O -->|检索策略不当| R[切换检索方法]
L --> S[重新执行]
M --> S
N --> S
P --> S
Q --> S
R --> S
S --> T{重试次数检查}
T -->|< MaxRetries| B
T -->|>= MaxRetries| U[使用现有最佳结果]
U --> J
style F fill:#fff3cd
style K fill:#fff3cd
style O fill:#fff3cd
style T fill:#fff3cd
优化策略表:
问题类型
检测指标
优化动作
低相关性
AvgScore < 0.5
查询重写、调整TopK
信息不完整
覆盖度 < 60%
扩大检索范围、多查询策略
排序不佳
Top1分数低但Top5有高分
启用Reranker
知识库缺失
所有文档分数 < 0.3
触发Web搜索
时效性问题
文档时间过旧
优先外部搜索
8.3.4 完整端到端工作流
sequenceDiagram
participant U as 用户
participant QU as 查询理解
participant PA as Planning Agent
participant O as Orchestrator
participant ToolLayer as 工具层
participant RA as Reflection Agent
participant Gen as 生成模块
U->>QU: 输入查询
QU->>QU: 意图识别、实体提取
QU->>PA: 传递查询特征
PA->>PA: 分析复杂度
PA->>PA: 任务分解
PA-->>O: 执行计划Plan
Note over O: 执行阶段
loop 遍历Plan中的Tasks
O->>ToolLayer: 执行Task
alt Task类型=检索
ToolLayer->>ToolLayer: 查询重写(可选)
ToolLayer->>ToolLayer: 向量检索
ToolLayer-->>O: 检索结果
else Task类型=优化
ToolLayer->>ToolLayer: 重排序/去重/压缩
ToolLayer-->>O: 优化后结果
else Task类型=评估
ToolLayer->>ToolLayer: CRAG评估器
ToolLayer-->>O: 质量评分
end
O->>RA: 评估中间结果
RA->>RA: 多维度打分
alt 质量不达标
RA-->>PA: 请求调整计划
PA->>PA: 生成补救策略
PA-->>O: 新的Task
else 质量满足
RA-->>O: 继续执行
end
end
Note over O,Gen: 生成阶段
O->>Gen: 传递最终上下文
Gen->>Gen: 构建Prompt
Gen->>Gen: LLM生成
Gen-->>RA: 生成的答案
RA->>RA: 答案质量检查
alt 答案可信度低
RA-->>O: 建议重新检索
O->>PA: 调整策略
else 答案质量OK
RA-->>Gen: 确认通过
Gen-->>U: 返回最终答案
end
type AgenticConfig struct { Enabled bool MaxRetries int Timeout time.Duration PlannerLLM string }
8.6. 参考文献
8.6.1 核心论文
Yao S, Zhao J, Yu D, et al. React: Synergizing reasoning and acting in language models[C]//The eleventh international conference on learning representations. 2022.
核心思想:交替执行推理(Reasoning)和行动(Acting)
应用实践:Planning Agent的设计
Shinn N, Cassano F, Labash B, et al. Reflexion: Language agents with verbal reinforcement learning, 2023[J]. URL https://arxiv. org/abs/2303.11366, 2023, 1.
核心思想:通过语言反馈进行自我反思和改进
应用实践:Reflection Agent的评估和优化机制
Yan S Q, Gu J C, Zhu Y, et al. Corrective retrieval augmented generation[J]. 2024.
核心思想:评估检索质量并触发纠正行动
应用实践:已集成到现有系统,Agentic层进一步增强
Asai A, Wu Z, Wang Y, et al. Self-rag: Learning to retrieve, generate, and critique through self-reflection[J]. 2024.
核心思想:自适应检索和生成,带自我批评
应用实践:反思机制的多维度评估
Schick T, Dwivedi-Yu J, Dessì R, et al. Toolformer: Language models can teach themselves to use tools[J]. Advances in Neural Information Processing Systems, 2023, 36: 68539-68551.
核心思想:LLM自主学习工具使用
应用实践:Tool Agent的工具选择策略
Wei J, Wang X, Schuurmans D, et al. Chain-of-thought prompting elicits reasoning in large language models[J]. Advances in neural information processing systems, 2022, 35: 24824-24837.
应用实践:Planning Agent的推理 Prompt设计
Liu N F, Lin K, Hewitt J, et al. Lost in the middle: How language models use long contexts[J]. Transactions of the Association for Computational Linguistics, 2024, 12: 157-173.