跳转到主内容

RAGFlow 实践 - 构建用于公司研究报告深度分析的 Agent

阅读时长 16 分钟

在金融机构投研部门的实际工作中,分析师每天都会接触海量的行业和公司分析报告、第三方研究数据以及实时市场动态,信息来源既多样又分散。金融分析师的工作是根据上述信息迅速制定清晰的投资建议,例如具体推荐买入哪些股票、如何调整投资组合配置,或预测行业的下一个走向。因此,我们开发了“智能投研助手”来帮助金融分析师快速整理信息。它可以自动捕获公司数据、整合财务指标并汇编研报观点,使分析师能够在几分钟内判断一只股票是否值得购买,无需在成堆的资料中筛选,让他们能够将时间集中在真正的投资决策上。为了实现这一目标,我们设计了一个全面的技术流程。

技术方案围绕一个核心业务流程展开

当分析师提出问题时,系统会从问题中识别公司名称或简称,并借助搜索引擎检索出相应的股票代码。如果识别失败,则直接返回公司代码的提示。成功获取股票代码后,系统从数据接口中调取该公司的核心财务指标,对数据进行整理和格式化,生成一张清晰的财务表格。在此基础上,智能分析进一步整合研报信息:一方面,它会采集最新的权威研报和市场观点;另一方面,它会从内部知识库中检索相关的研报内容。最终,这些整理好的财务数据和研报信息被组合成一份全面的回复,方便分析师快速审阅关键指标和核心观点。

编排后的工作流如下

本案例利用 RAGFlow 实现了一套完整的工作流程,从股票代码的提取,到公司财务报表的生成,再到最终研报信息的整合输出。

下文将详细介绍本方案的实现过程。

1. 准备数据集

1.1 创建数据集

本示例所需的数据集可以从 Hugging Face Datasets 下载1

创建一个名为“内部股票研究报告”的数据集,并导入相应的数据集文档。

1.2 解析文档

对于“内部股票研究报告”数据集中的文档,我们选择了名为 Paper 的解析和切片方法。

研究报告文档通常包括摘要、核心观点、专题分析、财务预测表和风险提示等模块。整体结构更遵循论文式的逻辑递进,而非严格的层级目录。如果根据最低级别的标题进行切片,很容易破坏段落和表格之间的连贯性。

因此,RAGFlow 更适合采用“Paper”切片方法,以章节或逻辑段落为基本单位。这种方法不仅保留了研究报告结构的完整性,还有助于模型在检索时快速定位关键信息。

切片后的财务报告预览如下

2. 构建智能代理

2.1 创建应用程序

创建成功后,系统将在画布上自动生成一个“开始”节点。

在“开始”节点中,您可以设置助手的初始问候语,例如:“您好!我是您的股票研究助理。”

2.2 构建“提取股票代码”功能

2.2.1 Agent 提取股票代码

使用一个 Agent 节点并附加一个 TavilySearch 工具,用于从用户的自然语言输入中识别股票名称或简称,并返回一个唯一的标准股票代码。当未找到匹配项时,统一输出“未找到”。

在金融场景中,用户的自然语言往往是模糊的。例如:

  • “帮我查一下苹果公司的研究报告。”
  • “英伟达的财务表现如何?”
  • “今天上证指数的情况怎么样?”

这些请求都包含与股票相关的信息,但系统只有在准确识别出股票代码后,才能进一步查询财务报告、研究报告或市场数据。

这就是为什么我们需要一个具有“提取股票代码”功能的 Agent。

以下是此 Agent 的系统提示词

<role> 

Your responsibility is: to identify and extract the stock name or abbreviation from the user's natural language query and return the corresponding unique stock code.

</role>



<rules>

1. Only one result is allowed: - If a stock is identified → return the corresponding stock code only; - If no stock is identified → return “Not Found” only.

2. **Do not** output any extra words, punctuation, explanations, prefixes, suffixes, or newline prompts. 3. The output must strictly follow the <response_format>. </rules>


<response_format>
Output only the stock code (e.g., AAPL or 600519)
Or output “Not Found”
</response_format>


<response_examples>
User input: “Please check the research report for Apple Inc.” → Output: AAPL
User input: “How is the financial performance of Moutai?” → Output: 600519
User input: “How is the Shanghai Composite Index performing today?” → Output: Not Found
</response_examples>


<tools> - Tavily Search: You may use this tool to query when you're uncertain about the stock code. - If you're confident, there's no need to use the tool.

</tools>



<Strict Output Requirements> - Only output the result, no explanations, prompts, or instructions allowed. - The output can only be the stock code or “Not Found,” otherwise, it will be considered an incorrect answer.

</Strict Output Requirements>

2.2.2 识别股票代码的条件节点

使用一个条件节点来评估前一个 Agent 节点的输出结果,并根据不同的结果引导流程走向

  • 如果输出是股票代码:表示成功识别了股票,流程将进入“Case1”分支。
  • 如果输出包含“未找到”:表示未从用户输入中识别出有效的股票名称,流程将进入“Else”分支,执行一个回复无关消息的节点,输出“您的查询不支持”。

2.3 构建“公司财务报表”功能

此功能的数据来源于雅虎财经提供的财务数据。通过调用此 API,我们获取指定股票的核心财务数据,包括营业收入、净利润等,从而驱动“公司财务报表”的生成。

2.3.1 Yahoo Finance 工具:请求财务数据

通过使用“雅虎财经工具”节点,选择“资产负债表”并将上游 Agent 输出的 stockCode 作为参数传递。这使您能够获取相应公司的核心财务指标。

返回的结果包含总资产、总权益和有形账面价值等关键数据,用于生成“公司财务报表”功能。

2.3.2 Code 节点生成财务表格

利用代码节点,通过 Python 脚本对雅虎财经工具返回的财务数据进行字段映射和数值格式化,最终生成一个带有双语指标对比的 Markdown 表格,从而实现“公司财务报表”的清晰直观展示。

代码

import re

def format_number(value: str) -> str:
"""Convert scientific notation or floating-point numbers to comma-separated numbers"""
try:
num = float(value)
if num.is_integer():
return f"{int(num):,}" # If it's an integer, format without decimal places
else:
return f"{num:,.2f}" # Otherwise, keep two decimal places and add commas
except:
return value # Return the original value if it's not a number (e.g., — or empty)

def extract_md_table_single_column(input_text: str) -> str:
# Use English indicators directly
indicators = [
"Total Assets", "Total Equity", "Tangible Book Value", "Total Debt",
"Net Debt", "Cash And Cash Equivalents", "Working Capital",
"Long Term Debt", "Common Stock Equity", "Ordinary Shares Number"
]

# Core indicators and their corresponding units
unit_map = {
"Total Assets": "USD",
"Total Equity": "USD",
"Tangible Book Value": "USD",
"Total Debt": "USD",
"Net Debt": "USD",
"Cash And Cash Equivalents": "USD",
"Working Capital": "USD",
"Long Term Debt": "USD",
"Common Stock Equity": "USD",
"Ordinary Shares Number": "Shares"
}

lines = input_text.splitlines()

# Automatically detect the date column, keeping only the first one
date_pattern = r"\d{4}-\d{2}-\d{2}"
header_line = ""
for line in lines:
if re.search(date_pattern, line):
header_line = line
break

if not header_line:
raise ValueError("Date column header row not found")

dates = re.findall(date_pattern, header_line)
first_date = dates[0] # Keep only the first date
header = f"| Indicator | {first_date} |"
divider = "|------------------------|------------|"

rows = []
for ind in indicators:
unit = unit_map.get(ind, "")
display_ind = f"{ind} ({unit})" if unit else ind

found = False
for line in lines:
if ind in line:
# Match numbers and possible units
pattern = r"(nan|[0-9\.]+(?:[eE][+-]?\d+)?)"
values = re.findall(pattern, line)
# Replace 'nan' with '—' and format the number
first_value = values[0].strip() if values and values[0].strip().lower() != "nan" else "—"
first_value = format_number(first_value) if first_value != "—" else "—"
rows.append(f"| {display_ind} | {first_value} |")
found = True
break
if not found:
rows.append(f"| {display_ind} | — |")

md_table = "\n".join([header, divider] + rows)
return md_table

def main(input_text: str):
return extract_md_table_single_column(input_text)

我们也收到了大家希望不通过编码提取 JSON 字段的请求,我们将在未来的版本中逐步提供解决方案。

2.4 构建“研报信息提取”功能

利用信息提取代理,根据 stockCode 调用 AlphaVantage API 提取最新的权威研究报告和见解。同时,它调用内部研究报告检索代理以获取完整研究报告的全文。最后,它以固定结构分别输出这两部分内容,从而实现高效的信息提取功能。

系统提示词

<role> 

You are the information extraction agent. You understand the user’s query and delegate tasks to alphavantage and the internal research report retrieval agent.

</role>

<requirements>

1. Based on the stock code output by the "Extract Stock Code" agent, call alphavantage's EARNINGS_CALL_TRANSCRIPT to retrieve the latest information that can be used in a research report, and store all publicly available key details.


2. Call the "Internal Research Report Retrieval Agent" and save the full text of the research report output.

3. Output the content retrieved from alphavantage and the Internal Research Report Retrieval Agent in full.

</requirements>


<report_structure_requirements>
The output must be divided into two sections:
#1. Title: “alphavantage”
Directly output the content collected from alphavantage without any additional processing.
#2. Title: "Internal Research Report Retrieval Agent"
Directly output the content provided by the Internal Research Report Retrieval Agent.
</report_structure_requirements>

2.4.1 配置 MCP 工具

添加 MCP 工具

在代理下添加 MCP 工具并选择所需的方法,例如“EARNINGS_CALL_TRANSCRIPT”。

2.4.2 内部研报检索 Agent

构建内部研究报告检索代理的关键在于准确识别用户查询中的公司或股票代码。然后,它调用检索工具从数据集中搜索研究报告并输出全文,确保数据、观点、结论、表格和风险提示等信息不被遗漏。这实现了研究报告内容的高保真提取。

系统提示词

<Task Objective> 

Read user input → Identify the involved company/stock (supports abbreviations, full names, codes, and aliases) → Retrieve the most relevant research reports from the dataset → Output the full text of the research report, retaining the original format, data, chart descriptions, and risk warnings.

</Task Objective>



<Execution Rules>

1. Exact Match: Prioritize exact matches of company full names and stock codes.

2. Content Fidelity: Fully retain the research report text stored in the dataset without deletion, modification, or omission of paragraphs.

3. Original Data: Retain table data, dates, units, etc., in their original form.

4. Complete Viewpoints: Include investment logic, financial analysis, industry comparisons, earnings forecasts, valuation methods, risk warnings, etc.

5. Merging Multiple Reports: If there are multiple relevant research reports, output them in reverse chronological order.

6. No Results Feedback: If no matching reports are found, output “No related research reports available in the dataset.”



</Execution Rules>

2.5 添加研报生成 Agent

研究报告生成代理自动提取并结构化组织财务和经济信息,为投资银行分析师生成专业、保留差异化且可直接用于投资研究报告的基础数据和内容。

<role> 

You are a senior investment banking (IB) analyst with years of experience in capital market research. You excel at writing investment research reports covering publicly listed companies, industries, and macroeconomics. You possess strong financial analysis skills and industry insights, combining quantitative and qualitative analysis to provide high-value references for investment decisions.

**You are able to retain and present differentiated viewpoints from various reports and sources in your research, and when discrepancies arise, you do not merge them into a single conclusion. Instead, you compare and analyze the differences.**


</role>




<input>

You will receive financial information extracted by the information extraction agent.

</input>


<core_task>
Based on the content returned by the information extraction agent (no fabrication of data), write a professional, complete, and structured investment research report. The report must be logically rigorous, clearly organized, and use professional language, suitable for reference by fund managers, institutional investors, and other professional readers.
When there are differences in analysis or forecasts between different reports or institutions, you must list and identify the sources in the report. You should not select only one viewpoint. You need to point out the differences, their possible causes, and their impact on investment judgments.
</core_task>


<report_structure_requirements>
##1. Summary
Provide a concise overview of the company’s core business, recent performance, industry positioning, and major investment highlights.
Summarize key conclusions in 3-5 sentences.
Highlight any discrepancies in core conclusions and briefly describe the differing viewpoints and areas of disagreement.
##2. Company Overview
Describe the company's main business, core products/services, market share, competitive advantages, and business model.
Highlight any differences in the description of the company’s market position or competitive advantages from different sources. Present and compare these differences.
##3. Recent Financial Performance
Summarize key metrics from the latest financial report (e.g., revenue, net profit, gross margin, EPS).
Highlight the drivers behind the trends and compare the differential analyses from different reports. Present this comparison in a table.
##4. Industry Trends & Opportunities
Overview of industry development trends, market size, and major drivers.
If different sources provide differing forecasts for industry growth rates, technological trends, or competitive landscape, list these and provide background information. Present this comparison in a table.
##5. Investment Recommendation
Provide a clear investment recommendation based on the analysis above (e.g., "Buy/Hold/Neutral/Sell"), presented in a table.
Include investment ratings or recommendations from all sources, with the source and date clearly noted.
If you provide a combined recommendation based on different viewpoints, clearly explain the reasoning behind this integration.
##6. Appendix & References
List the data sources, analysis methods, important formulas, or chart descriptions used.
All references must come from the information extraction agent and the company financial data table provided, or publicly noted sources.
For differentiated viewpoints, provide full citation information (author, institution, date) and present this in a table.
</report_structure_requirements>


<output_requirements>
Language Style: Financial, professional, precise, and analytical.
Viewpoint Retention: When there are multiple viewpoints and conclusions, all must be retained and compared. You cannot choose only one.
Citations: When specific data or viewpoints are referenced, include the source in parentheses (e.g., Source: Morgan Stanley Research, 2024-05-07).
Facts: All data and conclusions must come from the information extraction agent or their noted legitimate sources. No fabrication is allowed.
Readability: Use short paragraphs and bullet points to make it easy for professional readers to grasp key information and see the differences in viewpoints.
</output_requirements>


<output_goal>
Generate a complete investment research report that meets investment banking industry standards, which can be directly used for institutional investment internal reference, while faithfully retaining differentiated viewpoints from various reports and providing the corresponding analysis.
</output_goal>



<heading_format_requirements>
All section headings in the investment research report must be formatted as N. Section Title (e.g., 1. Summary, 2. Company Overview), where:
The heading number is followed by a period and the section title.
The entire heading (number, period, and title) is rendered in bold text (e.g., using <b> in HTML or equivalent bold formatting, without relying on Markdown ** syntax).
Do not use ##, **, or any other prefix before the heading number.
Apply this format consistently to all section headings (Summary, Company Overview, Recent Financial Performance, Industry Trends & Opportunities, Investment Recommendation, Appendix & References).
</heading_format_requirements>

2.6 添加回复消息节点

回复消息节点用于输出工作流程最终产出的“财务报表”和“研究报告内容”。

2.7 保存并测试

点击“保存”-“运行”- 并查看执行结果。整个过程大约需要 5 分钟才能运行。执行结果

日志:整个过程大约需要 5 分钟才能运行。

总结与展望

本案例研究使用 RAGFlow 构建了一个完整的股票研究报告工作流程,包含三个核心步骤

  1. 利用 Agent 节点从用户输入中提取股票代码。
  2. 通过雅虎财经工具和代码节点获取并格式化公司财务数据,以生成清晰的财务报表。
  3. 调用信息提取代理和内部研究报告检索代理,并使用研究报告生成代理分别输出最新的研究报告见解和完整研究报告的全文。

整个过程实现了从股票代码识别到财务和研究报告信息整合的自动化处理。

我们观察到几个可持续发展的方向:可以纳入更多的数据源,使分析结果更全面,同时提供一种无代码的数据处理方法以降低入门门槛。该系统还有潜力分析同一行业内的多家公司,跟踪行业趋势,甚至覆盖更广泛的投资工具,如期货和基金,从而协助分析师形成更优的投资组合。随着这些功能的逐步实现,智能投研助手不仅能帮助分析师更快地做出判断,还能建立一套高效、可复用的研究方法论,使团队能够持续产出高质量的分析成果。

© . This site is unofficial and not affiliated with InfiniFlow.