diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md
new file mode 100644
index 0000000000..5382f72554
--- /dev/null
+++ b/.claude/CLAUDE.md
@@ -0,0 +1,65 @@
+
+
+
+# oh-my-claudecode - Intelligent Multi-Agent Orchestration
+
+You are running with oh-my-claudecode (OMC), a multi-agent orchestration layer for Claude Code.
+Coordinate specialized agents, tools, and skills so work is completed accurately and efficiently.
+
+
+- Delegate specialized work to the most appropriate agent.
+- Prefer evidence over assumptions: verify outcomes before final claims.
+- Choose the lightest-weight path that preserves quality.
+- Consult official docs before implementing with SDKs/frameworks/APIs.
+
+
+
+Delegate for: multi-file changes, refactors, debugging, reviews, planning, research, verification.
+Work directly for: trivial ops, small clarifications, single commands.
+Route code to `executor` (use `model=opus` for complex work). Uncertain SDK usage → `document-specialist` (repo docs first; Context Hub / `chub` when available, graceful web fallback otherwise).
+
+
+
+`haiku` (quick lookups), `sonnet` (standard), `opus` (architecture, deep analysis).
+Direct writes OK for: `~/.claude/**`, `.omc/**`, `.claude/**`, `CLAUDE.md`, `AGENTS.md`.
+
+
+
+Invoke via `/oh-my-claudecode:`. Trigger patterns auto-detect keywords.
+Tier-0 workflows include `autopilot`, `ultrawork`, `ralph`, `team`, and `ralplan`.
+Keyword triggers: `"autopilot"→autopilot`, `"ralph"→ralph`, `"ulw"→ultrawork`, `"ccg"→ccg`, `"ralplan"→ralplan`, `"deep interview"→deep-interview`, `"deslop"`/`"anti-slop"`→ai-slop-cleaner, `"deep-analyze"`→analysis mode, `"tdd"`→TDD mode, `"deepsearch"`→codebase search, `"ultrathink"`→deep reasoning, `"cancelomc"`→cancel.
+Team orchestration is explicit via `/team`.
+Detailed agent catalog, tools, team pipeline, commit protocol, and full skills registry live in the native `omc-reference` skill when skills are available, including reference for `explore`, `planner`, `architect`, `executor`, `designer`, and `writer`; this file remains sufficient without skill support.
+
+
+
+Verify before claiming completion. Size appropriately: small→haiku, standard→sonnet, large/security→opus.
+If verification fails, keep iterating.
+
+
+
+Broad requests: explore first, then plan. 2+ independent tasks in parallel. `run_in_background` for builds/tests.
+Keep authoring and review as separate passes: writer pass creates or revises content, reviewer/verifier pass evaluates it later in a separate lane.
+Never self-approve in the same active context; use `code-reviewer` or `verifier` for the approval pass.
+Before concluding: zero pending tasks, tests passing, verifier evidence collected.
+
+
+
+Hooks inject `` tags. Key patterns: `hook success: Success` (proceed), `[MAGIC KEYWORD: ...]` (invoke skill), `The boulder never stops` (ralph/ultrawork active).
+Persistence: `` (7 days), `` (permanent).
+Kill switches: `DISABLE_OMC`, `OMC_SKIP_HOOKS` (comma-separated).
+
+
+
+`/oh-my-claudecode:cancel` ends execution modes. Cancel when done+verified or blocked. Don't cancel if work incomplete.
+
+
+
+State: `.omc/state/`, `.omc/state/sessions/{sessionId}/`, `.omc/notepad.md`, `.omc/project-memory.json`, `.omc/plans/`, `.omc/research/`, `.omc/logs/`
+
+
+## Setup
+
+Say "setup omc" or run `/oh-my-claudecode:omc-setup`.
+
+
diff --git a/.claude/rules/project-knowledge/config-pitfalls.mdc b/.claude/rules/project-knowledge/config-pitfalls.mdc
new file mode 100644
index 0000000000..2aa3c16b0c
--- /dev/null
+++ b/.claude/rules/project-knowledge/config-pitfalls.mdc
@@ -0,0 +1,41 @@
+---
+description: LoongCollector 采集配置常见陷阱。编写或审查 pipeline config YAML 时参考。
+globs:
+ - "**/*.feature"
+ - "**/case.feature"
+ - "core/config/**"
+ - "test/e2e/**"
+alwaysApply: false
+---
+# LoongCollector 采集配置陷阱
+
+## ExcutionTimeout 使配置变为一次性(onetime)
+
+`global.ExcutionTimeout` 存在于配置中时,**整个配置**被标记为 onetime 类型。
+只有注册了 `RegisterOnetimeInputCreator` 的插件才能在 onetime 配置中使用。
+
+大部分输入插件(`input_forward`, `input_file`, `input_container_stdio`, `input_prometheus` 等)只注册了 `RegisterContinuousInputCreator`,在 onetime 配置中会报错:
+
+```
+failed to parse config:unsupported input plugin module:input_forward
+```
+
+### 判断逻辑
+
+```
+global.ExcutionTimeout 存在
+ → PipelineConfig::GetExpireTimeIfOneTime → mOnetimeExpireTime 被设置
+ → CollectionConfig::IsOnetime() == true
+ → IsValidNativeInputPlugin(name, true) 在 ONETIME 注册表查找
+ → 找不到 → "unsupported input plugin"
+```
+
+### 支持 onetime 的输入插件
+
+查看 `PluginRegistry::LoadStaticPlugins()` 中调用 `RegisterOnetimeInputCreator` 的插件,如 `InputStaticFile`。
+
+### 规则
+
+- **持续运行的输入插件配置中不要使用 `ExcutionTimeout`**
+- E2E 测试不需要 `ExcutionTimeout` 来控制超时,Go test 的 `-timeout` 参数已经提供了保护
+- 如果确实需要一次性采集,使用 `onetime_pipeline_config` 目录 + 支持 onetime 的输入插件
diff --git a/.claude/settings.json b/.claude/settings.json
new file mode 100644
index 0000000000..c6802dc44e
--- /dev/null
+++ b/.claude/settings.json
@@ -0,0 +1,5 @@
+{
+ "enabledPlugins": {
+ "oh-my-claudecode@omc": true
+ }
+}
diff --git a/.claude/skills/code-review/SKILL.md b/.claude/skills/code-review/SKILL.md
new file mode 100644
index 0000000000..76aca88ac1
--- /dev/null
+++ b/.claude/skills/code-review/SKILL.md
@@ -0,0 +1,451 @@
+---
+name: code-review
+description: 在进行 Code Review 时,使用这个技能对 LoongCollector 变更进行安全导向、架构一致性优先的深度代码评审。
+metadata:
+ requires:
+ bins:
+ - python3
+ - git
+ - gh
+---
+# Code Review Agent Skill
+
+你是 LoongCollector 项目的高级代码审查助手。你的核心目标是发现真实缺陷、行为回归和风险点,而不是给出泛泛建议。
+
+为避免假阳性,必须遵守:
+
+- 分析问题时必须包含充分上下文,不能只看局部 diff 就下结论。
+- 结论必须基于实际读取到的代码与变更,不允许基于记忆或猜测。
+- 先理解作者意图和端到端流程,再给出问题判断。
+- 遵循以下执行步骤,以实现代码修改后可以针对增量 Review,检查既有评审的修复情况。
+
+## TOC
+
+- [Preflight(确保依赖工具存在)](#preflight确保依赖工具存在)
+- [Local Branch Sync(确保代码新鲜)](#local-branch-sync确保代码新鲜)
+- [Review Plan(开始前规划,避免遗漏)](#review-plan开始前规划避免遗漏)
+- [脚本失败降级策略](#脚本失败降级策略)
+- [Phase 1: Review Workspace & Incremental State(评审工作区与增量状态)](#phase-1-review-workspace--incremental-state评审工作区与增量状态)
+- [Phase 2: Context Building(全局认知)](#phase-2-context-building全局认知)
+- [Phase 3: Intent Analysis(意图理解)](#phase-3-intent-analysis意图理解)
+- [牢记评估标准(无需输出)](#牢记评估标准无需输出)
+- [Phase 4: Sub-agent Review(专项检查)](#phase-4-sub-agent-review专项检查)
+- [Phase 5: Final Report(最终输出)](#phase-5-final-report最终输出)
+
+## Preflight(确保依赖工具存在)
+
+在进入 Phase 1 前,必须先执行以下命令并全部通过:
+
+- `python3 --version`
+- `git rev-parse --is-inside-work-tree`
+- `gh auth status`
+
+若任一命令失败,必须停止后续评审步骤,并按 `references/failure-playbook.md` 修复后重试。
+
+## Local Branch Sync(确保代码新鲜)
+
+当复用本地 PR 分支做评审时,请在正式评审前先同步一次代码,避免使用过期工作副本:
+
+1. 读取远程 PR 当前 `headRefOid`(或分支当前 `HEAD` SHA)。
+2. 对应本地分支执行同步(如 `git fetch` + `git pull --ff-only` 或等价流程)。
+3. 在 `final-report.md` 顶部记录本轮评审使用的 `head` SHA,便于追溯。
+
+## Review Plan(开始前规划,避免遗漏)
+
+在进入 Phase 1 细节步骤前,先在评审目录生成并维护 `review-plan.md`,用于“逐步执行 + 勾选校验”:
+
+1. 文件路径:
+ - PR:`code-review/pr-/review-plan.md`
+ - 分支:`code-review/branch-/review-plan.md`
+2. 至少包含:
+ - 本轮评审对象(PR/分支、base/head SHA)
+ - 本轮待办清单(checkbox),按“**大项 + 子项**”拆分
+ - 当前阶段标记(`in_progress`)
+ - 阻塞项与降级记录(若有)
+3. 执行要求:
+ - 每完成一个步骤,必须同步勾选;
+ - 若中断或切换策略(如 `incremental -> full`),必须先更新计划再继续。
+ - 不允许只写 Phase 名称而不拆子项(例如“Phase 1”必须细分到拉评论、更新状态、映射决策等子项)。
+4. 模板使用:
+ - `references/review-plan.template.md` 仅提供骨架;
+ - agent 必须根据本轮实际情况自行填写大项与子项。
+
+## 脚本失败降级策略
+
+若执行脚本报错,允许进入降级评审模式继续完成代码评审,但必须执行以下动作用于持续优化 skill:
+
+- 在 `code-review//script-failures.md` 记录失败信息(脚本名、命令、错误摘要、触发时间、回退策略)。
+- 评审继续时一律切换到 `full` 全量评审,并人工核对关键状态文件。
+- 在 `final-report.md` 增加 “Script Failure Feedback” 小节,说明失败影响范围与人工补偿动作。
+- 将失败信息反馈到技能维护通道(可用时使用 `mcp-feedback-enhanced`,不可用时至少落盘到 `script-failures.md` 供后续回收)。
+
+## Phase 1: Review Workspace & Incremental State(评审工作区与增量状态)
+
+开始评审前,先初始化或复用仓库根目录下的评审工作区:
+
+- PR 评审目录:`code-review/pr-/`
+- 分支评审目录:`code-review/branch-/`
+- 目录不存在时必须创建,且保留历史评审轮次
+
+该目录至少包含以下文件:
+
+- `meta.json`:评审对象与基线元数据(repo、base/head、review 时间、策略参数)
+- `review-plan.md`:本轮执行计划与勾选进度(先计划再执行)
+- `reviewed_commits.json`:已评审 commit 集合与映射记录
+- `intent-architecture-notes.md`:代码理解文档(Phase 3)
+- `final-report.md`:最终报告(Phase 5)
+- `comments/review-comments.json`:PR review comments 原始快照(仅此来源)
+- `comments/comment-status.json`:评论状态判定结果(流程状态 + 技术状态)
+
+输入门禁:
+
+- 首次运行:
+ - 允许上述文件不存在;
+ - 必须先执行初始化脚本生成最小文件骨架,再继续后续步骤。
+- 非首次运行:
+ - 关键输入文件必须存在且 schema 合法;
+ - 若不合法,必须按 `references/failure-playbook.md` 执行“全量重建/重抓取”恢复流程,不允许手工拼接 JSON 继续运行。
+
+模板与脚本目录(必须使用):
+
+- JSON 模板:`.claude/skills/code-review/references/`
+- 流程脚本:`.claude/skills/code-review/scripts/`
+
+执行步骤(必须按顺序):
+
+1. 初始化评审目录与基础文件:
+ - PR:`python3 .claude/skills/code-review/scripts/init_review_workspace.py --repo-root --target-type pr --target-id --base-ref --head-ref --base-sha --head-sha `
+ - 分支:`python3 .claude/skills/code-review/scripts/init_review_workspace.py --repo-root --target-type branch --target-id --base-ref --head-ref --base-sha --head-sha `
+2. 生成/更新 `review-plan.md`(可基于 `references/review-plan.template.md` 骨架,但必须补齐本轮大项/子项),并将当前阶段标记为 `Phase 1 in_progress`。
+3. 拉取 review comments 到 `comments/review-comments.json`:
+ - PR 评审:必须运行 `python3 .claude/skills/code-review/scripts/fetch_review_comments.py --repo-root --target-type pr --target-id `,仅 `PR review comments`
+ - 分支评审:可为空,或导入分支评审评论快照
+ - `review-comments.json` 必须是标准对象结构(根对象含 `comments` 数组,元素含 `comment_id/path/line/side/body`);若不满足,视为上游脚本错误,必须先修正上游脚本。
+ - 评论项必须包含 `thread_resolved` 布尔字段;流程状态仅由该字段决定(`true -> resolved`,`false -> open`)。
+ - `snapshot/` 必须保留源码相对路径层级,禁止平铺文件名。示例:`snapshot/round-2/files/core/ebpf/protocol/redis/RedisParser.cpp`。若出现平铺结果,视为快照脚本错误或中途中断,必须重跑修正。
+4. 生成/更新评论状态文件:
+ - PR:`python3 .claude/skills/code-review/scripts/update_comment_status.py --repo-root --target-type pr --target-id `
+ - 分支:`python3 .claude/skills/code-review/scripts/update_comment_status.py --repo-root --target-type branch --target-id `
+ - 说明:这一步只同步结构与流程状态(`status_flow`)并保留历史 `status_tech`,不会自动做代码复核判定。
+5. 生成双维状态 Markdown 报告(表格):
+ - PR:`python3 .claude/skills/code-review/scripts/generate_comment_status_report.py --repo-root --target-type pr --target-id `
+ - 分支:`python3 .claude/skills/code-review/scripts/generate_comment_status_report.py --repo-root --target-type branch --target-id `
+ - 输出文件固定为:`comments/comment-status.md`(列:评论时间、文件、行号、作者、评论、流程状态、技术状态)
+6. 计算增量映射与回退建议(`--base` 与 `--head` 必须传 commit SHA):
+ - PR:`python3 .claude/skills/code-review/scripts/incremental_review_mapper.py --repo-root --target-type pr --target-id --base --head --review-round `
+ - 分支:`python3 .claude/skills/code-review/scripts/incremental_review_mapper.py --repo-root --target-type branch --target-id --base --head --review-round `
+ - 当 `snapshot/latest.json` 存在时,映射脚本会计算 `snapshot_match_rate`,用于 rebase 冲突调整或 squash 合并后的增量决策辅助。
+7. 根据脚本输出中的 `recommendation` 执行:
+ - `incremental`:只评审 `need_review_commits`
+ - `partial`:优先评审 `need_review_commits`,并补审低置信 hunk
+ - `full`:执行全量评审,但必须做历史意见去重
+
+8. 技术状态(`status_tech`)必须逐条复核,不允许猜测:
+ - 必读输入(按顺序):
+ 1) `comments/review-comments.json`
+ 2) `comments/comment-status.json`
+ 3) `reviewed_commits.json`
+ 4) 当前代码中与 comment `path` 对应文件
+ 5) `snapshot/` 中同路径历史快照文件(若存在)
+ - 逐条处理规则(按 `comment_id`):
+ - 仅允许更新:`status_tech`、`mapped_finding_id`、`notes`
+ - `status_tech` 仅可取:`fixed|not-fixed|false-positive|partially-fixed`
+ - `notes` 必须写明“判定证据”,至少包含:对比文件、关键代码变化、结论原因
+ - 每轮必须优先复核上一轮未终态条目(`not-fixed`、`partially-fixed`)。
+ - 人工手动订正(支持):
+ - 若评论作者本人(当前 `gh` 登录账号)在该评论线程回复文本包含 `fixed`,状态同步为 `fixed`。
+ - 若回复文本包含 `false-positive`(或 `false positive`),状态同步为 `false-positive`。
+ - 手动订正由脚本在更新 `comment-status.json` 时自动吸收,并写入 `notes`。
+ - 终态跳过规则(默认开启):
+ - 当前 `status_tech` 为 `fixed` 或 `false-positive` 的条目,本轮默认跳过技术复核。
+ - 仅在以下条件触发时重开复核:
+ 1) 条目 `path` 在本轮 commit 范围内再次发生修改;
+ 2) 条目 `status_flow` 从 `resolved` 变为非 `resolved`;
+ 3) 人工显式指定强制复核(按 `comment_id` 列表)。
+ - 输出要求:
+ - 更新后的 `comments/comment-status.json`
+ - 重新生成 `comments/comment-status.md`
+9. 本轮评审收尾后,必须生成 snapshot 供下一轮增量决策使用:
+ - PR:`python3 .claude/skills/code-review/scripts/build_snapshot.py --repo-root --target-type pr --target-id --base --head --review-round `
+ - 分支:`python3 .claude/skills/code-review/scripts/build_snapshot.py --repo-root --target-type branch --target-id --base --head --review-round `
+ - 产物:`snapshot/round-/files/*`、`snapshot/round-/manifest.json`、`snapshot/latest.json`
+
+状态文件字段约束(必须遵守):
+
+- `reviewed_commits.json` 记录:
+ - `commit_sha`
+ - `patch_id`(用于 rebase 后精确映射)
+ - `review_round`
+ - `reviewed_at`
+ - `hunk_fingerprints`(数组)
+- `comments/comment-status.json` 记录:
+ - `comment_id`
+ - `path` / `line` / `side`
+ - `body`
+ - `snippet`(可读代码片段)
+ - `snippet_fingerprint`(规范化片段 hash)
+ - `status_flow`(`open|resolved|wont-fix|deferred`)
+ - `status_tech`(`fixed|not-fixed|false-positive|partially-fixed`)
+ - `mapped_finding_id`
+
+说明:
+
+- `snippet_fingerprint` 定义为“规范化代码片段 + 文件路径 + 评论定位三元组(line/side/comment_id)”的稳定 hash,不能只用行号。
+- 允许人工修正 `status_flow` 与 `status_tech`,但不得删除历史记录。
+
+增量评审策略(必须执行):
+
+1. 优先读取 `reviewed_commits.json`,只评审未覆盖的新变更。
+2. 若检测到 rebase/force-push,不可直接判定全量重审,先做映射再决策:
+ - L1(高置信):按 `patch-id` 映射旧 commit -> 新 commit,命中后继承“已评审”状态。
+ - L2(中置信):按 `path + 规范化 hunk 片段 + hunk 上下文` 做指纹匹配,仅补审未命中 hunk。
+ - L3(低置信):命中率低或冲突改写明显时,回退全量评审。
+3. 置信度门槛默认:
+ - `commit_map_rate >= 90%`:增量通过
+ - `hunk_match_rate >= 80%`:局部补审
+ - 否则全量回退
+4. 即使全量回退,也必须复用历史评论与 finding 去重,避免重复意见。
+
+snapshot 在增量决策中的职责(必须遵守):
+
+1. `snapshot` 是增量决策辅助依据,不替代 git 主链路(`patch-id`/`hunk`)。
+2. rebase 且发生冲突改写时,若 commit/hunk 映射不足,可使用 `snapshot_match_rate` 辅助从 `full` 降到 `partial`。
+3. squash 合并导致 commit 边界丢失时,`snapshot_match_rate` 用于判断是否可继续增量评审。
+4. 若 `snapshot_match_rate` 不足阈值,仍必须 `full` 全量评审。
+
+## Phase 2: Context Building(全局认知)
+
+开始评审前,必须先完成以下步骤:
+
+1. 读取 `../project-knowledge/SKILL.md`,建立系统架构和模块职责认知。
+2. 读取 `../project-knowledge/SKILL.md`,优先吸收:
+ - 公共能力入口(必须复用的 common/helper)
+ - 生命周期与资源释放不变量
+ - 配置/环境变量约定(兼容大小写、默认值、废弃参数映射)
+ - 历史 review 高频问题(作为优先检查清单)
+3. 读取并参考以下规范(按变更涉及范围选择):
+ - `../selfmonitor/SKILL.md`(自监控与告警相关改动必读)
+ - `../security-check/SKILL.md`(安全与合规相关改动必读)
+ - `../compile/SKILL.md`(涉及构建/编译链路时必读)
+4. 基于 PR/分支变更列表,读取受影响文件的完整上下文(至少覆盖变更函数、调用方、定义处)。
+5. 若改动涉及 pipeline/runner/配置系统,必须先阅读以下代码再下结论:
+ - `core/application/Application.cpp`(主循环、配置扫描、退出顺序)
+ - `core/collection_pipeline/CollectionPipelineManager.cpp`
+ - `core/collection_pipeline/CollectionPipeline.cpp`
+ - `core/runner/ProcessorRunner.cpp`
+ - `core/runner/FlusherRunner.cpp`
+ - `core/config/watcher/PipelineConfigWatcher.cpp`
+ - `core/config/OnetimeConfigInfoManager.cpp`
+ - `core/file_server/FileServer.cpp`
+ - `core/file_server/checkpoint/CheckPointManager.cpp`
+ - `core/file_server/checkpoint/CheckpointManagerV2.cpp`(改动涉及 exactly-once 时)
+6. 通过 MCP/`gh` 工具拉取评审上下文:
+ - PR 描述、提交历史、PR review comments、CI 状态
+ - 最近约 10 个相关 PR 的 review 评论(提炼团队偏好)
+7. 若可访问 Code 平台历史评论,优先抽样最近已合入 PR 的 review comments(建议>=30条)并做“模式交叉”:
+ - 把历史高频问题映射到本次变更文件,标记为“高风险检查项”
+ - 若与 `codebase-map` 冲突,以“最新代码事实 + 评论证据”更新结论
+8. 若发现历史约束或设计决策冲突,先记录“假设与证据”,后续在报告中显式说明。
+
+## Phase 3: Intent Analysis(意图理解)
+
+完成上下文分析后,必须先产出“理解文档”,再进入问题列表。该文档是给开发者学习和理解代码用的,不能省略。
+
+### Phase 3 输出要求(必须输出文档)
+
+必须输出一个独立文档(建议标题:`Code Review - Intent & Architecture Notes`),至少包含:
+
+- 作者意图:这个 PR/分支要解决什么问题,为什么现在做。
+- 端到端流程:从入口到出口,这次变更实际改变了哪些关键路径。
+- 影响范围:涉及哪些模块、接口、配置、状态文件、监控指标、告警链路。
+- 预期结果验证:改动是否达到目标,并给出证据与推理过程。
+
+### Phase 3 落盘要求(必须写入 code-review 目录)
+
+必须将 Phase 3 文档写入仓库 `code-review/` 目录,禁止只在聊天中输出。
+
+建议路径:
+
+- PR 评审:`code-review/pr-/intent-architecture-notes.md`
+- 分支评审:`code-review/branch-/intent-architecture-notes.md`(`/` 替换为 `-`)
+
+要求:
+
+- 若目录不存在必须先创建。
+- 文档顶部必须包含评审对象元信息(PR号/分支名、commit范围、生成时间)。
+
+### Mermaid 可视化要求(必须至少 2 张图)
+
+该理解文档必须包含 Mermaid 图,用于帮助学习与沟通。按改动内容选择,至少输出以下 2 类中的 2 张:
+
+- 架构图(模块关系 / 依赖边界)
+- 流程图(关键执行路径)
+- 时序图(组件交互、调用顺序、异步/重试行为)
+- 数据结构图(关键状态对象、队列、checkpoint 主从关系)
+
+建议:
+
+- 小改动:至少 2 张图(流程 + 时序)
+- 中大型改动:3-4 张图(架构 + 流程 + 时序 + 数据结构)
+
+注意:
+
+- 图必须与当前变更强相关,禁止画与本次 PR 无关的“百科全图”。
+- 图中节点命名使用代码中的真实组件/类型名称,避免抽象空词。
+- Mermaid 语法请遵循 `../mermaid/SKILL.md`。
+
+## 牢记评估标准(无需输出)
+
+对每个变更文件和差异块,按以下 6 组标准检查:
+
+1. 业务与架构:目标达成、职责边界、拓扑与依赖、故障传播。
+2. 正确性与安全:边界检查、类型/异常处理、外部输入防御、安全合规。
+3. 并发与生命周期:线程/锁/队列正确退出、资源释放、状态恢复。
+4. 性能与资源:热路径复杂度、拷贝与分配、容量上限、日志开销。
+5. 稳定性与可观测:指标/日志/告警完整性与可定位性。
+6. 可维护性、兼容性与文档测试:可读性、向后兼容、文档与测试覆盖。
+
+注意:以上不是“通用建议列表”,而是必须落到每个 sub-agent 的责任范围中执行(见下一节责任矩阵)。
+
+## Phase 4: Sub-agent Review(专项检查)
+
+并行启动专项 sub-agent(建议 3-4 个并行,避免过度拆分)。每个 sub-agent 独立输出“发现的问题 + 证据”。
+每个 sub-agent 必须引用“牢记评估标准”中对应条目,不得只做口头判断。
+每个问题必须标注来源标准编号(例如:`[S3]` 表示“并发与生命周期”)。
+
+### 责任矩阵(主责/次责)
+
+- Sub-agent A(逻辑与架构):主责 `S1`,次责 `S6`
+- Sub-agent B(并发与生命周期):主责 `S3`,次责 `S5`
+- Sub-agent C(安全稳定与性能):主责 `S2` + `S4`,次责 `S5`
+- Sub-agent D(复用、兼容、文档测试):主责 `S6`,次责 `S1` + `S5`
+
+规则:
+
+- 主责标准必须全量覆盖;次责标准只需覆盖与本次改动直接相关的部分。
+- 若某问题跨多个标准,允许多标记(如 `[S2][S4]`)。
+- 不允许多个 agent 报告同一问题的重复结论;若重复,保留证据更完整的一条。
+
+### Sub-agent A: 逻辑正确性与架构一致性
+
+- 业务逻辑是否完整,是否存在边界漏处理、状态不一致、错误传播断裂。
+- 与 LoongCollector 架构约束是否一致(输入/处理/输出职责、Runner 模式、配置注册模式)。
+- 是否引入隐式依赖、循环依赖或故障传播不可观测的问题。
+- 重点覆盖评估标准:业务与架构、可维护性与兼容性。
+
+### Sub-agent B: 并发、异步与生命周期
+
+- 锁粒度、锁顺序、数据竞争、线程退出路径是否安全。
+- 回调/异步流程是否存在竞态、悬空引用、未处理失败路径。
+- 新增线程/定时任务是否可控停止,是否符合项目既有模式。
+- 重点覆盖评估标准:并发与生命周期、稳定性与可观测。
+- 生命周期/资源管理必查细则(必须逐项核对,重点是“正确释放与状态恢复”):
+ - 资源释放闭环:
+ - 每条路径(启动失败、热更新替换、删除配置、进程退出)都要核对资源闭环:
+ - 线程/future 可退出并被回收
+ - queue pop 被 disable 后不再悬挂
+ - 插件/Go pipeline 可停止且不残留引用
+ - flush/batch/checkpoint 落盘语义与路径一致
+ - 死锁与卡死风险:
+ - 锁顺序是否跨模块一致(pipeline manager / queue manager / file server)。
+ - `WaitAllItemsInProcessFinished`、队列 `Wait/Trigger`、`HoldOn/Resume` 是否可能形成循环等待。
+ - 长等待仅告警不终止的路径,是否可能导致永久卡住或退出超时。
+ - 状态恢复正确性(核心):
+ - 热加载后是否恢复到“可继续采集+处理+发送”的一致状态,而非部分组件已恢复。
+ - 文件采集 `Pause -> Dump -> ClearCache -> Resume` 后,handler/checkpoint/缓存三者是否一致。
+ - 配置失败回滚时,旧 pipeline/task 是否保持可用,不出现半更新状态。
+ - 顺序检查作为辅证(不是唯一判据):
+ - 仍需核对关键顺序(runner init 顺序、pipeline start/stop 顺序),但结论必须落到资源与状态结果。
+
+### Sub-agent C: 安全、稳定性与性能
+
+- 输入校验、异常处理、重试退避、资源释放(RAII)是否完备。
+- 右值/所有权:核验【调用点-传参-消费点】全链路,防止异常转移或冗余拷贝。
+- 是否存在热路径性能回退(重复计算、拷贝、容器增长失控、高频日志刷屏)。
+- 监控指标/告警是否完整,是否满足自监控规范。
+- 重点覆盖评估标准:正确性与安全、性能与资源、稳定性与可观测。
+- Checkpoint 必查细则(按改动范围选择):
+ - onetime checkpoint:
+ - 启动时 `LoadCheckpointFile()`,配置变化后 `DumpCheckpointFile()`。
+ - 超时删除、`RemoveConfig()` 与 checkpoint 文件是否保持一致,避免残留条目导致错误恢复。
+ - file checkpoint(v1):
+ - `FileServer::Start()` 是否仍保持 `LoadCheckPoint()` 在前、注册 handler 在后。
+ - `Pause/Stop` 是否保证 `DumpCheckPointToLocal()`,以及失败场景是否有可定位日志/告警。
+ - exactly-once checkpoint(v2):
+ - 主 checkpoint 与 range checkpoint 是否成对维护,避免孤儿 key。
+ - 扫描与 GC 逻辑是否可能误删活跃 checkpoint,或导致恢复时状态不连续。
+
+### Sub-agent D: 复用合规与文档一致性
+
+- 是否重复实现了已有公共能力(优先复用 `core/common` 与现有工具函数)。
+- 注释与代码行为是否一致,TODO/FIXME 是否引入新技术债。
+- 插件配置或 `GetXxxParam` 改动是否同步更新 `docs/` 对应文档。
+- 重点覆盖评估标准:可维护性、兼容性与文档测试。
+
+## Phase 5: Final Report(最终输出)
+
+Final Report 偏实用交付,可直接用于落地修复和平台流转。它与 Phase 2 的“理解文档”并行存在、互不替代。
+
+### Phase 5 输出要求(实用导向)
+
+1. 先给 **Findings**,按严重度排序:`Critical` > `High` > `Medium` > `Low`。
+2. 每个问题必须包含可定位证据与可执行建议。
+3. 若未发现问题,明确写出“未发现阻断问题”,并列出残余风险与测试缺口。
+4. 最后补充 **Highlights**(正向实践),简洁即可。
+5. 必须包含 **Lifecycle Verdict**:
+ - 资源释放:`PASS/FAIL`
+ - 死锁/卡死风险:`PASS/FAIL`
+ - 状态恢复正确性:`PASS/FAIL`
+ - 每项附 1-3 条证据。
+6. 必须包含 **Fix Plan**(按优先级分组):
+ - 立即修复(阻断合入)
+ - 合入前修复
+ - 可后续改进
+7. 必须包含 **Validation Plan**(修复后怎么验证):
+ - 需要跑哪些测试、观察哪些指标、验证哪些告警与恢复路径。
+
+### Final Report 落盘要求(必须写入 code-review 目录)
+
+必须将 Final Report 写入仓库 `code-review/` 目录,禁止只在聊天中输出。
+
+建议路径(与 Phase 2 同目录):
+
+- PR 评审:`code-review/pr-/final-report.md`
+- 分支评审:`code-review/branch-/final-report.md`(`/` 替换为 `-`)
+
+要求:
+
+- `final-report.md` 必须引用对应的 `intent-architecture-notes.md`(相对路径链接)。
+- 若执行了平台发布(PR评论/Review),在文档末尾记录发布链接;若失败,记录失败原因与重试命令。
+
+问题输出格式:
+
+```markdown
+- Severity:
+ - File: [<路径>:<起始行号>](file://./<路径>#L<起始行号>)
+ - 问题: <一句话说明问题本质>
+ - 影响: <可能导致的错误行为/风险>
+ - 建议: <可直接执行的修复建议,必要时给最小代码片段>
+```
+
+额外要求:
+
+- 行号必须在最终输出前重新核对,确保可点击跳转。
+- 仅评论真实变更范围内的问题,避免“顺手重构建议”淹没核心缺陷。
+- 语气专业、直接、简洁,优先给出可验证结论。
+
+### 平台发布(可选但推荐)
+
+若当前评审场景是 PR/分支评审,且工具可用,请在用户要求发布后自动化发布 Final Report:
+
+- 必须等待用户显式确认后才能执行发布。
+- 发布结构:
+ 1) **Inline Findings**:将可定位的问题逐条作为代码行内评论发布(不是回复到 PR 主评论)。
+ 2) **PR 摘要评论**:将Final Report 摘要回复到 PR 主评论。
+ - 必含:Critical/High/Medium/Low 数量统计表、Lifecycle PASS/FAIL 表格、Lifecycle FAIL 证据、总体结论、Highlights。
+ - 不含:不重复贴全部 findings。
+- 发布工具:
+ - 优先使用 `gh` 工具提交结构化评审结果;若环境存在 GitHub MCP,可等价使用 MCP。
+ - Inline 评论建议使用 `gh api repos///pulls//comments`(需包含 `commit_id/path/line/side/body`)。
+ - 摘要评论建议使用 `gh pr comment --body-file `。
+- 若发布失败,必须在输出中说明失败原因并给出可复制的发布内容。
diff --git a/.claude/skills/code-review/references/comment-status.template.json b/.claude/skills/code-review/references/comment-status.template.json
new file mode 100644
index 0000000000..1007f4cd19
--- /dev/null
+++ b/.claude/skills/code-review/references/comment-status.template.json
@@ -0,0 +1,23 @@
+{
+ "version": "1.0",
+ "generated_at": "",
+ "review_target": {
+ "type": "pr",
+ "id": ""
+ },
+ "status": [
+ {
+ "comment_id": 0,
+ "path": "",
+ "line": 0,
+ "side": "RIGHT",
+ "body": "",
+ "snippet": "",
+ "snippet_fingerprint": "",
+ "status_flow": "open",
+ "status_tech": "not-fixed",
+ "mapped_finding_id": "",
+ "notes": ""
+ }
+ ]
+}
diff --git a/.claude/skills/code-review/references/failure-playbook.md b/.claude/skills/code-review/references/failure-playbook.md
new file mode 100644
index 0000000000..00c2c19b5b
--- /dev/null
+++ b/.claude/skills/code-review/references/failure-playbook.md
@@ -0,0 +1,65 @@
+# Code-Review Failure Playbook
+
+本文件是故障恢复决策表,目标是让 agent 在异常时做正确分流:自动恢复、回退流程、或请求人工介入。
+
+## 总原则
+
+- 优先判断当前是首次还是非首次。
+- 不手工拼接 JSON;恢复后必须回到标准流程节点继续执行。
+- Preflight 相关异常默认人工介入,其余优先自动回退到可重建节点。
+- 若脚本失败但不影响代码读取,允许降级继续评审,同时必须输出失败反馈记录。
+
+## 场景 1:Preflight 失败(人工介入)
+
+- 触发信号:`python3 --version` / `git rev-parse --is-inside-work-tree` / `gh auth status` 任一失败
+- 决策:停止自动执行,提示用户介入检查环境与认证
+- 动作级别:`manual_required`
+- 返回节点:Preflight(三条检查全部通过后再进入 Phase 1)
+
+## 场景 2:首次运行缺文件(正常入口,不是失败)
+
+- 触发信号:`code-review//` 不存在,或缺少 `meta.json` / `reviewed_commits.json` / `comments/*`
+- 决策:判定为 Bootstrap,走初始化流程
+- 动作级别:`auto_recover`
+- 返回节点:Phase 1-步骤 1(初始化)并顺序继续
+
+## 场景 3:非首次运行时输入 schema 非法
+
+- 触发信号:`invalid review-comments.json` / `invalid comment-status.json`
+- 决策:放弃损坏中间态,回退到 Bootstrap 重建关键输入
+- 动作级别:`auto_recover`
+- 返回节点:Phase 1-步骤 1(初始化)-> 步骤 2(拉取 comments)-> 步骤 3(重建状态)
+
+## 场景 4:commit 对象缺失 / commit 范围构建失败
+
+- 触发信号:`missing base/head commit object` 或 `failed to build commit range`
+- 决策:先自动同步 git 对象;若仍失败,转人工确认 base/head 选择
+- 动作级别:`auto_then_manual`
+- 返回节点:
+ - 自动恢复成功:Phase 1-步骤 5(增量映射)
+ - 自动恢复失败:人工确认后重跑步骤 5
+
+## 场景 5:snapshot 目录平铺
+
+- 触发信号:`snapshot/` 下没有源码相对路径层级(仅平铺文件)
+- 决策:视为快照过程异常,清空并重建快照
+- 动作级别:`auto_recover`
+- 返回节点:快照生成步骤(完成后继续技术状态复核)
+
+## 场景 6:脚本运行异常但可继续评审
+
+- 触发信号:任意脚本报错,但仓库代码与基础 git/gh 能力仍可读取
+- 决策:允许降级继续评审,避免流程阻塞;并强制记录失败反馈用于迭代 skill
+- 动作级别:`degrade_continue`
+- 必做动作:
+ - 写入 `code-review//script-failures.md`(脚本名、命令、错误、时间、补偿动作)
+ - 评审策略一律切换到 `full` 全量评审
+ - 在 `final-report.md` 增加 “Script Failure Feedback” 小节
+- 返回节点:当前评审阶段(按降级策略继续)
+
+## 动作级别定义
+
+- `manual_required`:必须人工介入后才能继续
+- `auto_recover`:agent 可自动恢复并继续流程
+- `auto_then_manual`:先自动尝试,失败后升级人工
+- `degrade_continue`:允许继续评审,但必须记录失败并反馈
diff --git a/.claude/skills/code-review/references/meta.template.json b/.claude/skills/code-review/references/meta.template.json
new file mode 100644
index 0000000000..f1b824918f
--- /dev/null
+++ b/.claude/skills/code-review/references/meta.template.json
@@ -0,0 +1,19 @@
+{
+ "version": "1.0",
+ "repo": "",
+ "review_target": {
+ "type": "pr",
+ "id": "",
+ "base_ref": "",
+ "head_ref": "",
+ "base_sha": "",
+ "head_sha": ""
+ },
+ "strategy": {
+ "commit_map_threshold": 0.9,
+ "hunk_match_threshold": 0.8,
+ "fallback_on_low_confidence": true
+ },
+ "review_round": 1,
+ "generated_at": ""
+}
diff --git a/.claude/skills/code-review/references/review-plan.template.md b/.claude/skills/code-review/references/review-plan.template.md
new file mode 100644
index 0000000000..81d4da2d60
--- /dev/null
+++ b/.claude/skills/code-review/references/review-plan.template.md
@@ -0,0 +1,30 @@
+# Review Plan
+
+- Review Target: ``
+- Base SHA: ``
+- Head SHA: ``
+- Strategy: ``
+- Current Phase: ``
+
+## Work Items
+
+> 请按本轮实际任务填写,不要只写 Phase 名称。
+> 建议格式:每个大项下拆 2~5 个子项,并用 checkbox 跟踪进度。
+
+###
+
+- [ ]
+- [ ]
+
+###
+
+- [ ]
+- [ ]
+
+## Risks / Blockers
+
+-
+
+## Notes
+
+- 若策略切换(例如 `incremental -> full`),先更新本文件再继续执行。
diff --git a/.claude/skills/code-review/references/reviewed_commits.template.json b/.claude/skills/code-review/references/reviewed_commits.template.json
new file mode 100644
index 0000000000..3d069b8334
--- /dev/null
+++ b/.claude/skills/code-review/references/reviewed_commits.template.json
@@ -0,0 +1,19 @@
+{
+ "version": "1.0",
+ "review_rounds": [],
+ "commits": [
+ {
+ "commit_sha": "",
+ "patch_id": "",
+ "review_round": 1,
+ "reviewed_at": "",
+ "hunk_fingerprints": [],
+ "files": [],
+ "mapping": {
+ "method": "direct|patch-id|hunk-similarity|none",
+ "mapped_from_commit": "",
+ "confidence": 0.0
+ }
+ }
+ ]
+}
diff --git a/.claude/skills/code-review/scripts/build_snapshot.py b/.claude/skills/code-review/scripts/build_snapshot.py
new file mode 100755
index 0000000000..f14ea55962
--- /dev/null
+++ b/.claude/skills/code-review/scripts/build_snapshot.py
@@ -0,0 +1,148 @@
+#!/usr/bin/env python3
+import argparse
+import hashlib
+import json
+import re
+import subprocess
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Dict, List, Set, Tuple
+
+
+def utc_now() -> str:
+ return datetime.now(timezone.utc).replace(microsecond=0).isoformat()
+
+
+def sanitize_branch_name(branch_name: str) -> str:
+ return branch_name.replace("/", "-")
+
+
+def resolve_target(args: argparse.Namespace) -> Tuple[str, str]:
+ if args.target_type and args.target_id:
+ target_type = args.target_type
+ target_id = args.target_id
+ elif args.pr_number is not None:
+ target_type = "pr"
+ target_id = str(args.pr_number)
+ elif args.branch_name:
+ target_type = "branch"
+ target_id = args.branch_name
+ else:
+ raise SystemExit("must provide either --target-type/--target-id or --pr-number or --branch-name")
+ if target_type not in {"pr", "branch"}:
+ raise SystemExit("target type must be pr or branch")
+ return target_type, target_id
+
+
+def run_git(repo_root: Path, args: List[str]) -> str:
+ proc = subprocess.run(["git", *args], cwd=repo_root, text=True, capture_output=True, check=True)
+ return proc.stdout
+
+
+def run_git_no_check(repo_root: Path, args: List[str]) -> subprocess.CompletedProcess:
+ return subprocess.run(["git", *args], cwd=repo_root, text=True, capture_output=True, check=False)
+
+
+def normalize_file_content(text: str) -> str:
+ lines = [re.sub(r"\s+", " ", line.strip()) for line in text.splitlines()]
+ return "\n".join(lines)
+
+
+def stable_hash(text: str) -> str:
+ return hashlib.sha256(text.encode("utf-8")).hexdigest()
+
+
+def get_changed_files(repo_root: Path, base_sha: str, head_sha: str) -> List[str]:
+ out = run_git(repo_root, ["diff", "--name-only", f"{base_sha}..{head_sha}"])
+ return sorted({line.strip() for line in out.splitlines() if line.strip()})
+
+
+def get_file_content_at_commit(repo_root: Path, commit_sha: str, path: str) -> str:
+ proc = run_git_no_check(repo_root, ["show", f"{commit_sha}:{path}"])
+ if proc.returncode != 0:
+ return ""
+ return proc.stdout
+
+
+def main() -> None:
+ parser = argparse.ArgumentParser(description="Build snapshot baseline for incremental review.")
+ parser.add_argument("--repo-root", required=True)
+ parser.add_argument("--target-type", choices=["pr", "branch"])
+ parser.add_argument("--target-id")
+ parser.add_argument("--pr-number", type=int, help="PR number (legacy compatible)")
+ parser.add_argument("--branch-name", help="Branch name (legacy compatible)")
+ parser.add_argument("--base", required=True, help="Base commit SHA")
+ parser.add_argument("--head", required=True, help="Head commit SHA")
+ parser.add_argument("--review-round", required=True, type=int)
+ args = parser.parse_args()
+
+ repo_root = Path(args.repo_root).resolve()
+ target_type, target_id_raw = resolve_target(args)
+ target_id_dir = sanitize_branch_name(target_id_raw) if target_type == "branch" else target_id_raw
+ review_dir = repo_root / "code-review" / f"{target_type}-{target_id_dir}"
+
+ snapshot_root = review_dir / "snapshot" / f"round-{args.review_round}"
+ files_root = snapshot_root / "files"
+ files_root.mkdir(parents=True, exist_ok=True)
+
+ changed_files = get_changed_files(repo_root, args.base, args.head)
+ manifest_files: List[Dict[str, object]] = []
+
+ for rel_path in changed_files:
+ content = get_file_content_at_commit(repo_root, args.head, rel_path)
+ if content == "":
+ # Deleted file at head; keep entry for audit but no content snapshot.
+ manifest_files.append(
+ {"path": rel_path, "exists_in_head": False, "raw_hash": "", "normalized_hash": "", "size": 0}
+ )
+ continue
+
+ out_path = files_root / rel_path
+ out_path.parent.mkdir(parents=True, exist_ok=True)
+ out_path.write_text(content, encoding="utf-8")
+ manifest_files.append(
+ {
+ "path": rel_path,
+ "exists_in_head": True,
+ "raw_hash": stable_hash(content),
+ "normalized_hash": stable_hash(normalize_file_content(content)),
+ "size": len(content.encode("utf-8")),
+ }
+ )
+
+ manifest = {
+ "version": "1.0",
+ "review_target": {"type": target_type, "id": target_id_raw},
+ "review_round": args.review_round,
+ "base_sha": args.base,
+ "head_sha": args.head,
+ "generated_at": utc_now(),
+ "files": manifest_files,
+ }
+ manifest_path = snapshot_root / "manifest.json"
+ manifest_path.write_text(json.dumps(manifest, ensure_ascii=False, indent=2) + "\n", encoding="utf-8")
+
+ latest = {
+ "latest_round": args.review_round,
+ "manifest": str(manifest_path.relative_to(review_dir)),
+ "updated_at": utc_now(),
+ }
+ latest_path = review_dir / "snapshot" / "latest.json"
+ latest_path.parent.mkdir(parents=True, exist_ok=True)
+ latest_path.write_text(json.dumps(latest, ensure_ascii=False, indent=2) + "\n", encoding="utf-8")
+
+ print(
+ json.dumps(
+ {
+ "review_target": {"type": target_type, "id": target_id_raw},
+ "review_round": args.review_round,
+ "files": len(manifest_files),
+ "manifest": str(manifest_path),
+ },
+ ensure_ascii=False,
+ )
+ )
+
+
+if __name__ == "__main__":
+ main()
diff --git a/.claude/skills/code-review/scripts/fetch_review_comments.py b/.claude/skills/code-review/scripts/fetch_review_comments.py
new file mode 100755
index 0000000000..2bca6fca0b
--- /dev/null
+++ b/.claude/skills/code-review/scripts/fetch_review_comments.py
@@ -0,0 +1,204 @@
+#!/usr/bin/env python3
+import argparse
+import json
+import subprocess
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Any, Dict, List, Tuple
+
+
+def utc_now() -> str:
+ return datetime.now(timezone.utc).replace(microsecond=0).isoformat()
+
+
+def run_cmd(args: List[str], cwd: Path) -> str:
+ proc = subprocess.run(args, cwd=cwd, text=True, capture_output=True, check=False)
+ if proc.returncode != 0:
+ raise SystemExit(f"command failed: {' '.join(args)}\n{proc.stderr.strip()}")
+ return proc.stdout
+
+
+def sanitize_branch_name(branch_name: str) -> str:
+ return branch_name.replace("/", "-")
+
+
+def resolve_target(args: argparse.Namespace) -> Tuple[str, str]:
+ if args.target_type and args.target_id:
+ target_type = args.target_type
+ target_id = args.target_id
+ elif args.pr_number is not None:
+ target_type = "pr"
+ target_id = str(args.pr_number)
+ elif args.branch_name:
+ target_type = "branch"
+ target_id = args.branch_name
+ else:
+ raise SystemExit("must provide either --target-type/--target-id or --pr-number or --branch-name")
+ if target_type not in {"pr", "branch"}:
+ raise SystemExit("target type must be pr or branch")
+ return target_type, target_id
+
+
+def parse_name_with_owner(repo_root: Path) -> Tuple[str, str]:
+ out = run_cmd(["gh", "repo", "view", "--json", "nameWithOwner", "--jq", ".nameWithOwner"], repo_root).strip()
+ if "/" not in out:
+ raise SystemExit(f"invalid repository nameWithOwner: {out}")
+ owner, name = out.split("/", 1)
+ return owner, name
+
+
+def get_viewer_login(repo_root: Path) -> str:
+ out = run_cmd(["gh", "api", "user", "--jq", ".login"], repo_root).strip()
+ return out
+
+
+def run_graphql(repo_root: Path, owner: str, name: str, pr_number: int, cursor: str) -> Dict[str, Any]:
+ # Query review threads instead of plain review comments so we can
+ # persist thread-level resolution state deterministically.
+ query = """
+query($owner:String!, $name:String!, $number:Int!, $endCursor:String) {
+ repository(owner:$owner, name:$name) {
+ pullRequest(number:$number) {
+ reviewThreads(first:100, after:$endCursor) {
+ pageInfo { hasNextPage endCursor }
+ nodes {
+ isResolved
+ comments(first:100) {
+ nodes {
+ databaseId
+ body
+ path
+ line
+ originalLine
+ createdAt
+ updatedAt
+ author { login }
+ originalCommit { oid }
+ replyTo { databaseId }
+ }
+ }
+ }
+ }
+ }
+ }
+}
+"""
+ cmd = [
+ "gh",
+ "api",
+ "graphql",
+ "-f",
+ f"query={query}",
+ "-F",
+ f"owner={owner}",
+ "-F",
+ f"name={name}",
+ "-F",
+ f"number={pr_number}",
+ ]
+ if cursor:
+ cmd.extend(["-F", f"endCursor={cursor}"])
+ out = run_cmd(cmd, repo_root)
+ return json.loads(out)
+
+
+def main() -> None:
+ parser = argparse.ArgumentParser(description="Fetch PR review comments to stable schema file.")
+ parser.add_argument("--repo-root", required=True)
+ parser.add_argument("--target-type", choices=["pr", "branch"])
+ parser.add_argument("--target-id")
+ parser.add_argument("--pr-number", type=int, help="PR number (legacy compatible)")
+ parser.add_argument("--branch-name", help="Branch name (legacy compatible)")
+ args = parser.parse_args()
+
+ repo_root = Path(args.repo_root).resolve()
+ target_type, target_id_raw = resolve_target(args)
+ target_id_dir = sanitize_branch_name(target_id_raw) if target_type == "branch" else target_id_raw
+ review_dir = repo_root / "code-review" / f"{target_type}-{target_id_dir}"
+ comments_path = review_dir / "comments" / "review-comments.json"
+ comments_path.parent.mkdir(parents=True, exist_ok=True)
+
+ if target_type != "pr":
+ payload = {
+ "version": "1.0",
+ "source": "branch_review_comments",
+ "fetched_at": utc_now(),
+ "review_target": {"type": target_type, "id": target_id_raw},
+ "comments": [],
+ }
+ comments_path.write_text(json.dumps(payload, ensure_ascii=False, indent=2) + "\n", encoding="utf-8")
+ print(json.dumps({"target": f"{target_type}-{target_id_raw}", "threads": 0, "comments": 0, "resolved_threads": 0}))
+ return
+
+ owner, name = parse_name_with_owner(repo_root)
+ viewer_login = get_viewer_login(repo_root)
+ pr_number = int(target_id_raw)
+
+ cursor = ""
+ has_next = True
+ comments: List[Dict[str, Any]] = []
+ total_threads = 0
+ resolved_threads = 0
+
+ while has_next:
+ # Paginate until all review threads are collected.
+ data = run_graphql(repo_root, owner, name, pr_number, cursor)
+ threads_obj = data["data"]["repository"]["pullRequest"]["reviewThreads"]
+ page_info = threads_obj["pageInfo"]
+ threads = threads_obj["nodes"] or []
+ total_threads += len(threads)
+ for thread in threads:
+ is_resolved = bool(thread.get("isResolved", False))
+ if is_resolved:
+ resolved_threads += 1
+ thread_comments = thread.get("comments", {}).get("nodes", []) or []
+ for c in thread_comments:
+ author = (c.get("author") or {}).get("login", "")
+ original_commit = (c.get("originalCommit") or {}).get("oid", "")
+ reply_to = (c.get("replyTo") or {}).get("databaseId")
+ # Use originalLine as a stable anchor because line can be null
+ # after code evolves on newer commits.
+ original_line = c.get("originalLine")
+ line = original_line if isinstance(original_line, int) else 0
+ comments.append(
+ {
+ "comment_id": c.get("databaseId"),
+ "author": author,
+ "created_at": c.get("createdAt", ""),
+ "updated_at": c.get("updatedAt", ""),
+ "path": c.get("path", ""),
+ "line": line,
+ "side": "RIGHT",
+ "commit_id": original_commit,
+ "in_reply_to_id": reply_to,
+ "body": c.get("body", ""),
+ "thread_resolved": is_resolved,
+ }
+ )
+ has_next = bool(page_info.get("hasNextPage"))
+ cursor = page_info.get("endCursor") if has_next else ""
+
+ payload = {
+ "version": "1.0",
+ "source": "github_pr_review_comments",
+ "fetched_at": utc_now(),
+ "review_target": {"type": "pr", "id": target_id_raw},
+ "viewer_login": viewer_login,
+ "comments": comments,
+ }
+ comments_path.write_text(json.dumps(payload, ensure_ascii=False, indent=2) + "\n", encoding="utf-8")
+ print(
+ json.dumps(
+ {
+ "target": f"pr-{target_id_raw}",
+ "threads": total_threads,
+ "comments": len(comments),
+ "resolved_threads": resolved_threads,
+ },
+ ensure_ascii=False,
+ )
+ )
+
+
+if __name__ == "__main__":
+ main()
diff --git a/.claude/skills/code-review/scripts/generate_comment_status_report.py b/.claude/skills/code-review/scripts/generate_comment_status_report.py
new file mode 100755
index 0000000000..68e3aca8f5
--- /dev/null
+++ b/.claude/skills/code-review/scripts/generate_comment_status_report.py
@@ -0,0 +1,110 @@
+#!/usr/bin/env python3
+import argparse
+import json
+from pathlib import Path
+from typing import Dict, List, Tuple
+
+
+def sanitize_branch_name(branch_name: str) -> str:
+ return branch_name.replace("/", "-")
+
+
+def resolve_target(args: argparse.Namespace) -> Tuple[str, str]:
+ if args.target_type and args.target_id:
+ target_type = args.target_type
+ target_id = args.target_id
+ elif args.pr_number is not None:
+ target_type = "pr"
+ target_id = str(args.pr_number)
+ elif args.branch_name:
+ target_type = "branch"
+ target_id = args.branch_name
+ else:
+ raise SystemExit("must provide either --target-type/--target-id or --pr-number or --branch-name")
+ if target_type not in {"pr", "branch"}:
+ raise SystemExit("target type must be pr or branch")
+ return target_type, target_id
+
+
+def read_json(path: Path) -> Dict:
+ if not path.exists():
+ raise SystemExit(f"missing file: {path}")
+ return json.loads(path.read_text(encoding="utf-8"))
+
+
+def esc_cell(text: str) -> str:
+ return (text or "").replace("\n", " ").replace("|", "\\|").replace("`", "").strip()
+
+
+def build_comment_meta_map(review_comments_payload: Dict) -> Dict[int, Dict]:
+ comments = review_comments_payload.get("comments", [])
+ meta_map: Dict[int, Dict] = {}
+ if not isinstance(comments, list):
+ return meta_map
+ for c in comments:
+ cid = c.get("comment_id")
+ if isinstance(cid, int):
+ meta_map[cid] = c
+ return meta_map
+
+
+def build_markdown(target_type: str, target_id: str, items: List[Dict], comment_meta: Dict[int, Dict]) -> str:
+ lines = []
+ lines.append(f"# Comment Status Report ({target_type}-{target_id})")
+ lines.append("")
+ lines.append(f"- Total: {len(items)}")
+ lines.append("")
+ lines.append("| 评论时间 | File | Line | 作者 | Comment | Flow | Tech |")
+ lines.append("|---|---|---:|---|---|---|---|")
+ for item in items:
+ cid = item.get("comment_id", "")
+ meta = comment_meta.get(cid, {})
+ created_at = esc_cell(str(meta.get("created_at", "")))
+ author = esc_cell(str(meta.get("author", "")))
+ path = esc_cell(str(item.get("path", "")))
+ line = item.get("line", 0)
+ body = esc_cell(str(item.get("body", "")))
+ if len(body) > 160:
+ body = body[:157] + "..."
+ status_flow = esc_cell(str(item.get("status_flow", "")))
+ status_tech = esc_cell(str(item.get("status_tech", "")))
+ lines.append(
+ f"| {created_at} | `{path}` | {line} | {author} | {body} | {status_flow} | {status_tech} |"
+ )
+ lines.append("")
+ return "\n".join(lines)
+
+
+def main() -> None:
+ parser = argparse.ArgumentParser(description="Generate markdown report from comment-status.json.")
+ parser.add_argument("--repo-root", required=True)
+ parser.add_argument("--target-type", choices=["pr", "branch"])
+ parser.add_argument("--target-id")
+ parser.add_argument("--pr-number", type=int, help="PR number (legacy compatible)")
+ parser.add_argument("--branch-name", help="Branch name (legacy compatible)")
+ args = parser.parse_args()
+
+ repo_root = Path(args.repo_root).resolve()
+ target_type, target_id_raw = resolve_target(args)
+ target_id_dir = sanitize_branch_name(target_id_raw) if target_type == "branch" else target_id_raw
+ review_dir = repo_root / "code-review" / f"{target_type}-{target_id_dir}"
+
+ status_path = review_dir / "comments" / "comment-status.json"
+ review_comments_path = review_dir / "comments" / "review-comments.json"
+ report_path = review_dir / "comments" / "comment-status.md"
+
+ payload = read_json(status_path)
+ if not isinstance(payload, dict) or not isinstance(payload.get("status"), list):
+ raise SystemExit("invalid comment-status.json: root must be object and `status` must be list")
+ review_comments_payload = read_json(review_comments_path)
+ if not isinstance(review_comments_payload, dict):
+ raise SystemExit("invalid review-comments.json: root must be object")
+
+ comment_meta = build_comment_meta_map(review_comments_payload)
+ markdown = build_markdown(target_type, target_id_raw, payload["status"], comment_meta)
+ report_path.write_text(markdown + "\n", encoding="utf-8")
+ print(str(report_path))
+
+
+if __name__ == "__main__":
+ main()
diff --git a/.claude/skills/code-review/scripts/incremental_review_mapper.py b/.claude/skills/code-review/scripts/incremental_review_mapper.py
new file mode 100755
index 0000000000..d8d0a558d6
--- /dev/null
+++ b/.claude/skills/code-review/scripts/incremental_review_mapper.py
@@ -0,0 +1,450 @@
+#!/usr/bin/env python3
+import argparse
+import hashlib
+import json
+import re
+import subprocess
+from pathlib import Path
+from dataclasses import dataclass
+from datetime import datetime, timezone
+from typing import Dict, List, Optional, Set, Tuple
+
+
+def run_git(repo_root: Path, args: List[str]) -> str:
+ result = subprocess.run(
+ ["git", *args],
+ cwd=repo_root,
+ text=True,
+ capture_output=True,
+ check=True,
+ )
+ return result.stdout
+
+
+def run_git_no_check(repo_root: Path, args: List[str]) -> subprocess.CompletedProcess:
+ return subprocess.run(
+ ["git", *args],
+ cwd=repo_root,
+ text=True,
+ capture_output=True,
+ check=False,
+ )
+
+
+def utc_now() -> str:
+ return datetime.now(timezone.utc).replace(microsecond=0).isoformat()
+
+
+def normalize_code_line(line: str) -> str:
+ return re.sub(r"\s+", " ", line.strip())
+
+
+def stable_hash(text: str) -> str:
+ return hashlib.sha256(text.encode("utf-8")).hexdigest()
+
+
+def normalize_file_content(text: str) -> str:
+ # Keep line boundaries but normalize whitespace noise for robust matching.
+ lines = [re.sub(r"\s+", " ", line.strip()) for line in text.splitlines()]
+ return "\n".join(lines)
+
+
+def compute_patch_id(repo_root: Path, commit_sha: str) -> str:
+ patch_text = run_git(repo_root, ["show", "--pretty=format:", "--no-color", commit_sha])
+ proc = subprocess.run(
+ ["git", "patch-id", "--stable"],
+ cwd=repo_root,
+ text=True,
+ input=patch_text,
+ capture_output=True,
+ check=True,
+ )
+ output = proc.stdout.strip()
+ return output.split()[0] if output else ""
+
+
+def get_commit_files(repo_root: Path, commit_sha: str) -> List[str]:
+ out = run_git(repo_root, ["show", "--pretty=format:", "--name-only", "--no-color", commit_sha])
+ return sorted({line.strip() for line in out.splitlines() if line.strip()})
+
+
+def get_file_content_at_commit(repo_root: Path, commit_sha: str, path: str) -> Optional[str]:
+ proc = run_git_no_check(repo_root, ["show", f"{commit_sha}:{path}"])
+ if proc.returncode != 0:
+ return None
+ return proc.stdout
+
+
+def load_latest_snapshot_map(review_dir: Path) -> Dict[str, str]:
+ latest_path = review_dir / "snapshot" / "latest.json"
+ if not latest_path.exists():
+ return {}
+ try:
+ latest = json.loads(latest_path.read_text(encoding="utf-8"))
+ except Exception:
+ return {}
+ manifest_rel = latest.get("manifest")
+ if not isinstance(manifest_rel, str) or not manifest_rel:
+ return {}
+ manifest_path = review_dir / manifest_rel
+ if not manifest_path.exists():
+ return {}
+ try:
+ manifest = json.loads(manifest_path.read_text(encoding="utf-8"))
+ except Exception:
+ return {}
+ mapping: Dict[str, str] = {}
+ for item in manifest.get("files", []):
+ path = item.get("path")
+ n_hash = item.get("normalized_hash")
+ if isinstance(path, str) and isinstance(n_hash, str):
+ mapping[path] = n_hash
+ return mapping
+
+
+def compute_snapshot_match_rate(
+ repo_root: Path, head_sha: str, changed_files: Set[str], snapshot_map: Dict[str, str]
+) -> Optional[float]:
+ if not snapshot_map or not changed_files:
+ return None
+ overlap = [p for p in changed_files if p in snapshot_map]
+ if not overlap:
+ return None
+ matched = 0
+ for path in overlap:
+ content = get_file_content_at_commit(repo_root, head_sha, path)
+ if content is None:
+ continue
+ current_hash = stable_hash(normalize_file_content(content))
+ if current_hash == snapshot_map[path]:
+ matched += 1
+ return matched / len(overlap)
+
+
+def parse_hunk_fingerprints(repo_root: Path, commit_sha: str) -> List[str]:
+ patch = run_git(repo_root, ["show", "--pretty=format:", "--no-color", "-U3", commit_sha])
+ lines = patch.splitlines()
+ file_path = ""
+ hunk_header = ""
+ hunk_lines: List[str] = []
+ fps: List[str] = []
+
+ def flush() -> None:
+ nonlocal hunk_lines, hunk_header
+ if not hunk_lines:
+ return
+ key = file_path + "\n" + hunk_header + "\n" + "\n".join(hunk_lines)
+ fps.append(stable_hash(key))
+ hunk_lines = []
+ hunk_header = ""
+
+ for line in lines:
+ if line.startswith("diff --git "):
+ flush()
+ m = re.search(r" b/(.+)$", line)
+ file_path = m.group(1) if m else ""
+ continue
+ if line.startswith("@@"):
+ flush()
+ hunk_header = line
+ continue
+ if line.startswith("+") or line.startswith("-"):
+ if line.startswith("+++") or line.startswith("---"):
+ continue
+ hunk_lines.append(normalize_code_line(line[1:]))
+
+ flush()
+ return sorted(set(fps))
+
+
+def jaccard(a: Set[str], b: Set[str]) -> float:
+ if not a and not b:
+ return 1.0
+ if not a or not b:
+ return 0.0
+ return len(a & b) / len(a | b)
+
+
+@dataclass
+class CommitRecord:
+ commit_sha: str
+ patch_id: str
+ hunk_fingerprints: List[str]
+ review_round: int
+ reviewed_at: str
+ mapping: Dict[str, object]
+
+
+def load_json(path: Path) -> Dict:
+ if not path.exists():
+ return {}
+ return json.loads(path.read_text(encoding="utf-8"))
+
+
+def save_json(path: Path, payload: Dict) -> None:
+ path.write_text(json.dumps(payload, ensure_ascii=False, indent=2) + "\n", encoding="utf-8")
+
+
+def sanitize_branch_name(branch_name: str) -> str:
+ return branch_name.replace("/", "-")
+
+
+def resolve_target(args: argparse.Namespace) -> Tuple[str, str]:
+ if args.target_type and args.target_id:
+ target_type = args.target_type
+ target_id = args.target_id
+ elif args.pr_number is not None:
+ target_type = "pr"
+ target_id = str(args.pr_number)
+ elif args.branch_name:
+ target_type = "branch"
+ target_id = args.branch_name
+ else:
+ raise SystemExit("must provide either --target-type/--target-id or --pr-number or --branch-name")
+ if target_type not in {"pr", "branch"}:
+ raise SystemExit("target type must be pr or branch")
+ return target_type, target_id
+
+
+def ensure_commit_exists(repo_root: Path, sha: str, target_type: str, target_id: str) -> bool:
+ exists = run_git_no_check(repo_root, ["cat-file", "-e", f"{sha}^{{commit}}"])
+ if exists.returncode == 0:
+ return True
+
+ # First generic fetch to cover normal branch updates.
+ run_git_no_check(repo_root, ["fetch", "--all", "--prune", "--tags"])
+ exists = run_git_no_check(repo_root, ["cat-file", "-e", f"{sha}^{{commit}}"])
+ if exists.returncode == 0:
+ return True
+
+ # Then PR-specific fetch for detached PR heads.
+ if target_type == "pr":
+ run_git_no_check(repo_root, ["fetch", "origin", f"pull/{target_id}/head"])
+ exists = run_git_no_check(repo_root, ["cat-file", "-e", f"{sha}^{{commit}}"])
+ if exists.returncode == 0:
+ return True
+ return False
+
+
+def main() -> None:
+ parser = argparse.ArgumentParser(description="Map reviewed commits for incremental PR/branch review.")
+ parser.add_argument("--repo-root", required=True)
+ parser.add_argument("--target-type", choices=["pr", "branch"])
+ parser.add_argument("--target-id")
+ parser.add_argument("--pr-number", type=int, help="PR number (legacy compatible)")
+ parser.add_argument("--branch-name", help="Branch name (legacy compatible)")
+ parser.add_argument("--base", required=True, help="Base commit SHA for comparison")
+ parser.add_argument("--head", required=True, help="Head commit SHA for comparison")
+ parser.add_argument("--review-round", required=True, type=int)
+ parser.add_argument("--commit-map-threshold", type=float, default=0.9)
+ parser.add_argument("--hunk-match-threshold", type=float, default=0.8)
+ parser.add_argument("--snapshot-match-threshold", type=float, default=0.9)
+ args = parser.parse_args()
+
+ repo_root = Path(args.repo_root).resolve()
+ target_type, target_id_raw = resolve_target(args)
+ target_id_dir = sanitize_branch_name(target_id_raw) if target_type == "branch" else target_id_raw
+ review_dir = repo_root / "code-review" / f"{target_type}-{target_id_dir}"
+ reviewed_path = review_dir / "reviewed_commits.json"
+ reviewed = load_json(reviewed_path) or {"version": "1.0", "review_rounds": [], "commits": []}
+ old_commits = reviewed.get("commits", [])
+ snapshot_map = load_latest_snapshot_map(review_dir)
+
+ if not ensure_commit_exists(repo_root, args.base, target_type, target_id_raw):
+ raise SystemExit(
+ f"missing base commit object: {args.base}. "
+ "Please fetch the base branch history, then retry."
+ )
+ if not ensure_commit_exists(repo_root, args.head, target_type, target_id_raw):
+ raise SystemExit(
+ f"missing head commit object: {args.head}. "
+ "For PR review, try: git fetch origin pull//head"
+ )
+
+ try:
+ rev_list_output = run_git(repo_root, ["rev-list", "--reverse", f"{args.base}..{args.head}"])
+ except subprocess.CalledProcessError as e:
+ stderr = (e.stderr or "").strip()
+ raise SystemExit(
+ f"failed to build commit range {args.base}..{args.head}: {stderr or 'unknown git error'}"
+ )
+
+ current_commits = [sha for sha in rev_list_output.splitlines() if sha]
+ current_set = set(current_commits)
+ commit_files_map: Dict[str, List[str]] = {}
+ current_changed_files: Set[str] = set()
+ for sha in current_commits:
+ files = get_commit_files(repo_root, sha)
+ commit_files_map[sha] = files
+ current_changed_files.update(files)
+
+ old_by_sha = {c.get("commit_sha"): c for c in old_commits if c.get("commit_sha")}
+ old_by_patch_id: Dict[str, Dict] = {}
+ for c in old_commits:
+ pid = c.get("patch_id")
+ if pid and pid not in old_by_patch_id:
+ old_by_patch_id[pid] = c
+
+ mapped: Dict[str, CommitRecord] = {}
+ unchanged_by_sha = 0
+ for sha in current_commits:
+ if sha in old_by_sha:
+ oc = old_by_sha[sha]
+ mapped[sha] = CommitRecord(
+ commit_sha=sha,
+ patch_id=oc.get("patch_id", ""),
+ hunk_fingerprints=oc.get("hunk_fingerprints", []),
+ review_round=oc.get("review_round", args.review_round),
+ reviewed_at=oc.get("reviewed_at", utc_now()),
+ mapping={"method": "direct", "mapped_from_commit": sha, "confidence": 1.0},
+ )
+ unchanged_by_sha += 1
+
+ for sha in current_commits:
+ if sha in mapped:
+ continue
+ pid = compute_patch_id(repo_root, sha)
+ if pid and pid in old_by_patch_id:
+ oc = old_by_patch_id[pid]
+ mapped[sha] = CommitRecord(
+ commit_sha=sha,
+ patch_id=pid,
+ hunk_fingerprints=oc.get("hunk_fingerprints", []),
+ review_round=oc.get("review_round", args.review_round),
+ reviewed_at=oc.get("reviewed_at", utc_now()),
+ mapping={"method": "patch-id", "mapped_from_commit": oc.get("commit_sha", ""), "confidence": 0.98},
+ )
+
+ old_unmapped = [c for c in old_commits if c.get("commit_sha") not in current_set]
+ old_hunk_sets = {
+ c.get("commit_sha", ""): set(c.get("hunk_fingerprints", [])) for c in old_unmapped if c.get("commit_sha")
+ }
+
+ for sha in current_commits:
+ if sha in mapped:
+ continue
+ new_hunks = set(parse_hunk_fingerprints(repo_root, sha))
+ best_score = 0.0
+ best_old = ""
+ for old_sha, old_hunks in old_hunk_sets.items():
+ score = jaccard(new_hunks, old_hunks)
+ if score > best_score:
+ best_score = score
+ best_old = old_sha
+ if best_old and best_score >= args.hunk_match_threshold:
+ mapped[sha] = CommitRecord(
+ commit_sha=sha,
+ patch_id=compute_patch_id(repo_root, sha),
+ hunk_fingerprints=sorted(new_hunks),
+ review_round=args.review_round,
+ reviewed_at=utc_now(),
+ mapping={"method": "hunk-similarity", "mapped_from_commit": best_old, "confidence": round(best_score, 4)},
+ )
+
+ need_review: List[str] = [sha for sha in current_commits if sha not in mapped]
+
+ # commit_map_rate measures "how many OLD commits are accounted for in the
+ # new commit set", NOT "what fraction of current commits are mapped".
+ # Denominator = old commit count (the baseline we reviewed before).
+ # This way appending new commits doesn't penalise the rate, while rebase
+ # that loses old commits correctly lowers it.
+ old_commits_covered: Set[str] = set()
+ for rec in mapped.values():
+ from_sha = rec.mapping.get("mapped_from_commit", "")
+ if from_sha:
+ old_commits_covered.add(from_sha)
+ old_commit_count = len(old_commits)
+ commit_map_rate = (len(old_commits_covered) / old_commit_count) if old_commit_count > 0 else 1.0
+
+ if need_review:
+ hunk_scores: List[float] = []
+ for sha in need_review:
+ new_hunks = set(parse_hunk_fingerprints(repo_root, sha))
+ best = 0.0
+ for old_hunks in old_hunk_sets.values():
+ best = max(best, jaccard(new_hunks, old_hunks))
+ hunk_scores.append(best)
+ hunk_match_rate = (sum(hunk_scores) / len(hunk_scores)) if hunk_scores else 1.0
+ else:
+ hunk_match_rate = 1.0
+
+ if commit_map_rate >= args.commit_map_threshold:
+ recommendation = "incremental"
+ elif hunk_match_rate >= args.hunk_match_threshold:
+ recommendation = "partial"
+ else:
+ recommendation = "full"
+
+ snapshot_match_rate = compute_snapshot_match_rate(repo_root, args.head, current_changed_files, snapshot_map)
+ if recommendation == "full" and snapshot_match_rate is not None and snapshot_match_rate >= args.snapshot_match_threshold:
+ # For squash/rebase-conflict scenarios, snapshot evidence can safely
+ # downgrade from full to partial.
+ recommendation = "partial"
+
+ round_record = {
+ "review_round": args.review_round,
+ "generated_at": utc_now(),
+ "base": args.base,
+ "head": args.head,
+ "stats": {
+ "total_commits": len(current_commits),
+ "mapped_commits": len(mapped),
+ "direct_sha_hits": unchanged_by_sha,
+ "commit_map_rate": round(commit_map_rate, 4),
+ "hunk_match_rate": round(hunk_match_rate, 4),
+ "snapshot_match_rate": round(snapshot_match_rate, 4) if snapshot_match_rate is not None else None,
+ "recommendation": recommendation,
+ },
+ "need_review_commits": need_review,
+ }
+
+ merged_commits = [c for c in old_commits if c.get("commit_sha") not in current_set]
+ for sha in current_commits:
+ if sha in mapped:
+ c = mapped[sha]
+ merged_commits.append(
+ {
+ "commit_sha": c.commit_sha,
+ "patch_id": c.patch_id,
+ "review_round": c.review_round,
+ "reviewed_at": c.reviewed_at,
+ "hunk_fingerprints": c.hunk_fingerprints,
+ "files": commit_files_map.get(sha, []),
+ "mapping": c.mapping,
+ }
+ )
+ else:
+ merged_commits.append(
+ {
+ "commit_sha": sha,
+ "patch_id": compute_patch_id(repo_root, sha),
+ "review_round": args.review_round,
+ "reviewed_at": "",
+ "hunk_fingerprints": parse_hunk_fingerprints(repo_root, sha),
+ "files": commit_files_map.get(sha, []),
+ "mapping": {"method": "none", "mapped_from_commit": "", "confidence": 0.0},
+ }
+ )
+
+ reviewed["commits"] = merged_commits
+ reviewed.setdefault("review_rounds", []).append(round_record)
+ save_json(reviewed_path, reviewed)
+
+ print(
+ json.dumps(
+ {
+ "review_target": {"type": target_type, "id": target_id_raw},
+ "total_commits": len(current_commits),
+ "need_review_commits": need_review,
+ "commit_map_rate": round(commit_map_rate, 4),
+ "hunk_match_rate": round(hunk_match_rate, 4),
+ "snapshot_match_rate": round(snapshot_match_rate, 4) if snapshot_match_rate is not None else None,
+ "recommendation": recommendation,
+ },
+ ensure_ascii=False,
+ )
+ )
+
+
+if __name__ == "__main__":
+ main()
diff --git a/.claude/skills/code-review/scripts/init_review_workspace.py b/.claude/skills/code-review/scripts/init_review_workspace.py
new file mode 100755
index 0000000000..806c3ea481
--- /dev/null
+++ b/.claude/skills/code-review/scripts/init_review_workspace.py
@@ -0,0 +1,127 @@
+#!/usr/bin/env python3
+import argparse
+import json
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Any, Dict, Tuple
+
+
+SCRIPT_DIR = Path(__file__).resolve().parent
+SKILL_DIR = SCRIPT_DIR.parent
+REF_DIR = SKILL_DIR / "references"
+
+
+def utc_now() -> str:
+ return datetime.now(timezone.utc).replace(microsecond=0).isoformat()
+
+
+def read_json(path: Path) -> Dict[str, Any]:
+ return json.loads(path.read_text(encoding="utf-8"))
+
+
+def write_json(path: Path, data: Dict[str, Any]) -> None:
+ path.parent.mkdir(parents=True, exist_ok=True)
+ path.write_text(json.dumps(data, ensure_ascii=False, indent=2) + "\n", encoding="utf-8")
+
+
+def ensure_file_from_template(target: Path, template: Path, mutate=None) -> None:
+ if target.exists():
+ return
+ payload = read_json(template)
+ if mutate:
+ mutate(payload)
+ write_json(target, payload)
+
+
+def sanitize_branch_name(branch_name: str) -> str:
+ return branch_name.replace("/", "-")
+
+
+def resolve_target(args: argparse.Namespace) -> Tuple[str, str]:
+ if args.target_type and args.target_id:
+ target_type = args.target_type
+ target_id = args.target_id
+ elif args.pr_number is not None:
+ target_type = "pr"
+ target_id = str(args.pr_number)
+ elif args.branch_name:
+ target_type = "branch"
+ target_id = args.branch_name
+ else:
+ raise SystemExit("must provide either --target-type/--target-id or --pr-number or --branch-name")
+ if target_type not in {"pr", "branch"}:
+ raise SystemExit("target type must be pr or branch")
+ return target_type, target_id
+
+
+def main() -> None:
+ parser = argparse.ArgumentParser(description="Initialize code-review workspace for PR or branch.")
+ parser.add_argument("--repo-root", required=True, help="Repository root path")
+ parser.add_argument("--target-type", choices=["pr", "branch"], help="Review target type")
+ parser.add_argument("--target-id", help="Review target id (PR number or branch name)")
+ parser.add_argument("--pr-number", type=int, help="PR number (legacy compatible)")
+ parser.add_argument("--branch-name", help="Branch name (legacy compatible)")
+ parser.add_argument("--base-ref", default="", help="PR base ref")
+ parser.add_argument("--head-ref", default="", help="PR head ref")
+ parser.add_argument("--base-sha", default="", help="PR base sha")
+ parser.add_argument("--head-sha", default="", help="PR head sha")
+ args = parser.parse_args()
+
+ repo_root = Path(args.repo_root).resolve()
+ target_type, target_id_raw = resolve_target(args)
+ target_id_dir = sanitize_branch_name(target_id_raw) if target_type == "branch" else target_id_raw
+ review_dir = repo_root / "code-review" / f"{target_type}-{target_id_dir}"
+ comments_dir = review_dir / "comments"
+
+ comments_dir.mkdir(parents=True, exist_ok=True)
+
+ meta_path = review_dir / "meta.json"
+ reviewed_commits_path = review_dir / "reviewed_commits.json"
+ review_comments_path = comments_dir / "review-comments.json"
+ comment_status_path = comments_dir / "comment-status.json"
+
+ def mutate_meta(payload: Dict[str, Any]) -> None:
+ payload["repo"] = str(repo_root)
+ payload["review_target"]["type"] = target_type
+ payload["review_target"]["id"] = target_id_raw
+ payload["review_target"]["base_ref"] = args.base_ref
+ payload["review_target"]["head_ref"] = args.head_ref
+ payload["review_target"]["base_sha"] = args.base_sha
+ payload["review_target"]["head_sha"] = args.head_sha
+ payload["generated_at"] = utc_now()
+
+ def create_review_comments_payload() -> Dict[str, Any]:
+ return {
+ "version": "1.0",
+ "source": "github_pr_review_comments" if target_type == "pr" else "branch_review_comments",
+ "fetched_at": utc_now(),
+ "review_target": {"type": target_type, "id": target_id_raw},
+ "viewer_login": "",
+ "comments": [],
+ }
+
+ def mutate_comment_status(payload: Dict[str, Any]) -> None:
+ payload["review_target"]["type"] = target_type
+ payload["review_target"]["id"] = target_id_raw
+ payload["generated_at"] = utc_now()
+ payload["status"] = []
+
+ def mutate_reviewed_commits(payload: Dict[str, Any]) -> None:
+ payload["review_rounds"] = []
+ payload["commits"] = []
+
+ ensure_file_from_template(meta_path, REF_DIR / "meta.template.json", mutate_meta)
+ ensure_file_from_template(
+ reviewed_commits_path, REF_DIR / "reviewed_commits.template.json", mutate_reviewed_commits
+ )
+ if not review_comments_path.exists():
+ write_json(review_comments_path, create_review_comments_payload())
+ ensure_file_from_template(
+ comment_status_path, REF_DIR / "comment-status.template.json", mutate_comment_status
+ )
+
+ print(str(review_dir))
+
+
+if __name__ == "__main__":
+ main()
diff --git a/.claude/skills/code-review/scripts/update_comment_status.py b/.claude/skills/code-review/scripts/update_comment_status.py
new file mode 100755
index 0000000000..007490eb76
--- /dev/null
+++ b/.claude/skills/code-review/scripts/update_comment_status.py
@@ -0,0 +1,195 @@
+#!/usr/bin/env python3
+import argparse
+import hashlib
+import json
+import re
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Dict, List, Tuple
+
+
+def utc_now() -> str:
+ return datetime.now(timezone.utc).replace(microsecond=0).isoformat()
+
+
+def stable_hash(text: str) -> str:
+ return hashlib.sha256(text.encode("utf-8")).hexdigest()
+
+
+def normalize_text(text: str) -> str:
+ text = re.sub(r"\s+", " ", text.strip())
+ return text
+
+
+def infer_flow_status_from_comment(comment: Dict) -> str:
+ # Single deterministic rule from upstream schema.
+ return "resolved" if comment.get("thread_resolved") is True else "open"
+
+
+def read_json(path: Path) -> Dict:
+ if not path.exists():
+ return {}
+ return json.loads(path.read_text(encoding="utf-8"))
+
+
+def write_json(path: Path, payload: Dict) -> None:
+ path.parent.mkdir(parents=True, exist_ok=True)
+ path.write_text(json.dumps(payload, ensure_ascii=False, indent=2) + "\n", encoding="utf-8")
+
+
+def default_status(comment: Dict) -> Dict:
+ body = comment.get("body", "")
+ snippet = normalize_text(body)[:300]
+ fingerprint_seed = "|".join(
+ [
+ str(comment.get("path", "")),
+ str(comment.get("line", 0)),
+ str(comment.get("side", "RIGHT")),
+ snippet,
+ ]
+ )
+ return {
+ "comment_id": comment.get("comment_id"),
+ "path": comment.get("path", ""),
+ "line": comment.get("line", 0),
+ "side": comment.get("side", "RIGHT"),
+ "body": body,
+ "snippet": snippet,
+ "snippet_fingerprint": stable_hash(fingerprint_seed),
+ "status_flow": infer_flow_status_from_comment(comment),
+ # status_tech is owned by model review in later phase.
+ "status_tech": "not-fixed",
+ "mapped_finding_id": "",
+ "notes": "",
+ }
+
+
+def sanitize_branch_name(branch_name: str) -> str:
+ return branch_name.replace("/", "-")
+
+
+def resolve_target(args: argparse.Namespace) -> Tuple[str, str]:
+ if args.target_type and args.target_id:
+ target_type = args.target_type
+ target_id = args.target_id
+ elif args.pr_number is not None:
+ target_type = "pr"
+ target_id = str(args.pr_number)
+ elif args.branch_name:
+ target_type = "branch"
+ target_id = args.branch_name
+ else:
+ raise SystemExit("must provide either --target-type/--target-id or --pr-number or --branch-name")
+ if target_type not in {"pr", "branch"}:
+ raise SystemExit("target type must be pr or branch")
+ return target_type, target_id
+
+
+def validate_payload(raw: Dict) -> List[Dict]:
+ if not isinstance(raw, dict):
+ raise SystemExit("invalid review-comments.json: root must be object")
+ comments = raw.get("comments")
+ if not isinstance(comments, list):
+ raise SystemExit("invalid review-comments.json: `comments` must be list")
+ return comments
+
+
+def validate_comment(comment: Dict) -> None:
+ required = ["comment_id", "path", "line", "side", "body", "thread_resolved"]
+ missing = [k for k in required if k not in comment]
+ if missing:
+ raise SystemExit(
+ "invalid review comment record: missing required fields "
+ + ",".join(missing)
+ )
+
+
+def infer_manual_tech_override(replies: List[Dict], viewer_login: str) -> str:
+ if not viewer_login:
+ return ""
+ # Prefer the latest explicit override from current reviewer account.
+ for reply in reversed(replies):
+ if str(reply.get("author", "")).lower() != viewer_login.lower():
+ continue
+ text = normalize_text(str(reply.get("body", ""))).lower()
+ if "false-positive" in text or "false positive" in text or "假阳性" in text or "误判" in text:
+ return "false-positive"
+ if re.search(r"\bfixed\b", text) or "已修复" in text:
+ return "fixed"
+ return ""
+
+
+def main() -> None:
+ parser = argparse.ArgumentParser(description="Build comment-status.json from review comments.")
+ parser.add_argument("--repo-root", required=True)
+ parser.add_argument("--target-type", choices=["pr", "branch"])
+ parser.add_argument("--target-id")
+ parser.add_argument("--pr-number", type=int, help="PR number (legacy compatible)")
+ parser.add_argument("--branch-name", help="Branch name (legacy compatible)")
+ args = parser.parse_args()
+
+ repo_root = Path(args.repo_root).resolve()
+ target_type, target_id_raw = resolve_target(args)
+ target_id_dir = sanitize_branch_name(target_id_raw) if target_type == "branch" else target_id_raw
+ review_dir = repo_root / "code-review" / f"{target_type}-{target_id_dir}"
+ comments_path = review_dir / "comments" / "review-comments.json"
+ status_path = review_dir / "comments" / "comment-status.json"
+
+ comments_payload = read_json(comments_path)
+ if not comments_payload:
+ raise SystemExit(f"missing comments file: {comments_path}")
+ comments = validate_payload(comments_payload)
+ viewer_login = str(comments_payload.get("viewer_login", "")).strip()
+
+ replies_by_parent: Dict[int, List[Dict]] = {}
+ root_comments: List[Dict] = []
+ for c in comments:
+ parent = c.get("in_reply_to_id")
+ if parent is None:
+ root_comments.append(c)
+ else:
+ replies_by_parent.setdefault(parent, []).append(c)
+
+ previous = read_json(status_path)
+ previous_map = {item.get("comment_id"): item for item in previous.get("status", [])}
+
+ status: List[Dict] = []
+ seen_fp = set()
+ for comment in root_comments:
+ validate_comment(comment)
+ cid = comment.get("comment_id")
+ if cid in previous_map:
+ item = previous_map[cid]
+ # Preserve manual/model edits on status_tech/notes, always sync flow status from source.
+ item["status_flow"] = infer_flow_status_from_comment(comment)
+ else:
+ item = default_status(comment)
+
+ fp = item.get("snippet_fingerprint", "")
+ if fp and fp in seen_fp:
+ item["notes"] = (item.get("notes", "") + " duplicate-fingerprint").strip()
+ manual_override = infer_manual_tech_override(replies_by_parent.get(cid, []), viewer_login)
+ if manual_override:
+ item["status_tech"] = manual_override
+ item["notes"] = (item.get("notes", "") + f" manual-tech-override:{manual_override}").strip()
+ seen_fp.add(fp)
+ status.append(item)
+
+ payload = {
+ "version": "1.0",
+ "generated_at": utc_now(),
+ "review_target": {"type": target_type, "id": target_id_raw},
+ "status": status,
+ }
+ write_json(status_path, payload)
+
+ print(
+ json.dumps(
+ {"review_target": {"type": target_type, "id": target_id_raw}, "status_count": len(status)},
+ ensure_ascii=False,
+ )
+ )
+
+
+if __name__ == "__main__":
+ main()
diff --git a/.claude/skills/commit/SKILL.md b/.claude/skills/commit/SKILL.md
new file mode 100644
index 0000000000..046d2af45b
--- /dev/null
+++ b/.claude/skills/commit/SKILL.md
@@ -0,0 +1,40 @@
+---
+name: commit
+description: Write commit messages following to Conventional Commits standards.
+---
+# Commit Skill
+
+Generate commit messages that follow the Conventional Commits specification.
+
+## Format
+
+```
+type(scope): verb + object
+
+{why is this change needed, what user/system impact it brings}
+
+Fixes #{ISSUE_ID}
+```
+
+## Fields
+
+- **type**: `feat | fix | docs | style | refactor | perf | test | chore | revert`
+- **scope**: Optional. File/module/subsystem, e.g. `api`, `ui`, `auth`, `deps`
+- **subject**: <= 50 characters, imperative mood, lowercase first letter, no period
+- **body**: Each line <= 72 characters. Explain "what" and "why"
+- **footer**: Optional. Link Issue / PR / Breaking Change
+
+## Steps
+
+1. Collect information by reading `git diff`. Skip if user already provided context.
+2. Determine the commit type based on changes.
+3. If changes span multiple scopes, use the core module as scope.
+4. Extract added/modified/deleted functions, classes, interfaces for the subject.
+5. If breaking change, add `BREAKING CHANGE:` to footer.
+6. Present the complete commit message for user confirmation before executing `git commit`.
+
+## Prohibited
+
+- No meaningless descriptions like "update code", "fix bug", "wip"
+- No subject or body lines exceeding 72 characters
+- No issue links in the subject line
diff --git a/.claude/skills/compile/SKILL.md b/.claude/skills/compile/SKILL.md
new file mode 100644
index 0000000000..26d6bc8405
--- /dev/null
+++ b/.claude/skills/compile/SKILL.md
@@ -0,0 +1,83 @@
+---
+name: compile
+description: Building LoongCollector C++ and Go components. Use when compiling any part of the project.
+---
+# Compile Skill
+
+## How to Compile This Project
+
+This project has both C++ and Go components. Use the appropriate build method based on what you modified.
+
+### C++ Build
+
+**IMPORTANT: All CMake and make commands must run from inside the `build/` directory.** Running from repo root will reconfigure incorrectly.
+
+**Prerequisites** — Git submodules must be populated before first build:
+```bash
+git submodule update --init --recursive
+```
+Two submodules live under `core/_thirdparty/`:
+- `DCGM` — NVIDIA DCGM headers (`dcgm_agent.h` etc.)
+- `coolbpf` — eBPF framework
+
+If either is empty, compilation fails with `No such file or directory`.
+
+#### Build Steps
+
+```bash
+mkdir -p build && cd build
+cmake -DCMAKE_BUILD_TYPE=Debug -DLOGTAIL_VERSION=0.0.1 \
+ -DCMAKE_EXPORT_COMPILE_COMMANDS=1 \
+ -DCMAKE_CXX_FLAGS="-I/opt/rh/devtoolset-9/root/usr/lib/gcc/x86_64-redhat-linux/9/include -I/opt/logtail -I/opt/logtail_spl" \
+ -DBUILD_LOGTAIL=ON -DBUILD_LOGTAIL_UT=ON -DWITHOUTGDB=ON -DENABLE_STATIC_LINK_CRT=ON -DWITHSPL=OFF ../core
+make -sj$(nproc)
+```
+
+**Key CMake flags:**
+| Flag | Purpose |
+|------|---------|
+| `BUILD_LOGTAIL` | Build LoongCollector binary. Required. |
+| `BUILD_LOGTAIL_UT` | Build unit tests. Enable when modifying tests. |
+| `WITHSPL` | SPL support. Set `OFF` unless working on SPL files. |
+
+#### C++ Unit Tests
+
+Each test directory under `core/unittest/*/` produces its own executable.
+
+**Build tests** (from inside `build/`):
+```bash
+make yaml_util_unittest app_config_unittest safe_queue_unittest -j$(nproc)
+```
+
+**Run tests** (from inside `build/`):
+```bash
+./unittest/common/yaml_util_unittest
+./unittest/app_config/app_config_unittest
+```
+
+Tests must run from `build/` because some rely on relative paths for config files and temporary output.
+
+### Go Plugin Build
+
+```bash
+make plugin_local
+```
+
+### Docker Build
+
+```bash
+make image
+```
+
+### Cross-Compilation
+
+For ARM64:
+```bash
+make image ARCH=arm64
+```
+
+### Common Issues
+
+- If CMake complains about missing dependencies, install them via `apt` or `yum`
+- If linking fails, try `make clean` then rebuild
+- For SPL-related builds, change `WITHSPL=OFF` to `WITHSPL=ON` in the cmake command
diff --git a/.claude/skills/design-document/SKILL.md b/.claude/skills/design-document/SKILL.md
new file mode 100644
index 0000000000..d62cebdfa9
--- /dev/null
+++ b/.claude/skills/design-document/SKILL.md
@@ -0,0 +1,98 @@
+---
+name: design-document
+description: Design document writing conventions. Use when writing or reviewing technical design documents.
+---
+# Design Document Conventions
+
+## 1. Background / Problem Statement
+
+### 1.1 Background and Pain Points
+- Describe current system/module limitations and deficiencies
+- List specific scenarios, metrics, or incident cases that triggered this design
+
+### 1.2 Impact Scope
+- Affected modules, microservices, APIs, data stores, third-party dependencies
+- Potential impact on performance, reliability, cost, maintainability
+- Forward/backward compatibility analysis
+
+### 1.3 Constraints
+- Compliance/security/performance/resource restrictions
+- External system or infrastructure dependencies
+
+---
+
+## 2. Design Goals
+
+### 2.1 Functional Goals
+- List Must/Should/Could core capabilities by priority
+
+### 2.2 Non-Functional Goals
+- Performance (throughput, latency, concurrency, resource usage)
+- Scalability, maintainability, testability, observability
+- Reliability (fault tolerance, HA, degradation, rollback strategies)
+
+### 2.3 Constraint Goals
+- Backward compatibility, API stability
+- Security and compliance requirements
+
+---
+
+## 3. Technical Design
+
+### 3.1 Architecture Diagram
+- Use Mermaid for high-level component diagrams with data/control flow
+
+### 3.2 Detailed Flowcharts
+- Key business flows, exception flows, retry/compensation with timing and triggers
+
+### 3.3 Thread/Concurrency Model
+- Thread lifecycle, inter-thread communication (locks, condition variables, queues, Actor patterns)
+- Sequence diagrams for concurrency interactions
+
+### 3.4 Core Classes and Data Structures
+- Class diagrams showing main classes, interfaces, inheritance/composition relationships
+- Key data structure fields, lifecycle, thread-safety strategy
+
+### 3.5 Key Algorithms or Protocols
+- Pseudocode or flow for pub/sub, load balancing, retry backoff, etc.
+- State machine / protocol state transition diagrams
+
+### 3.6 Error Handling and Recovery
+- Error classification, exception stack, retry strategies, degradation plans
+- Monitoring metrics, alert trigger conditions and levels
+
+### 3.7 Deployment and Operations
+- Configuration items, hot-update mechanisms, canary and rollback strategies
+- CI/CD, container, Service Mesh, Kubernetes resource considerations
+
+---
+
+## 4. Unit Testing
+
+### 4.1 Test Scope and Goals
+- Cover core logic, boundary conditions, concurrency scenarios, exception paths
+
+### 4.2 Test Environment and Tools
+- Google Test/Mock version, necessary third-party stubs/fakes
+
+### 4.3 Test Scenarios and Cases
+| Case ID | Scenario | Input | Expected Output/Behavior | Mock Dependencies |
+|---------|----------|-------|--------------------------|-------------------|
+| TC-01 | Normal single log push | Single valid LogRecord | Returns SUCCESS, buffer size +1 | None |
+| TC-02 | Buffer full | capacity=N filled | Throws BufferOverflowException | None |
+| TC-03 | Concurrent push | Multi-thread simultaneous push | No data loss, order/final consistency matches design | MutexMock |
+| TC-04 | flush clears | M items exist, then flush | Returns M items, buffer size=0 | TimeProviderMock |
+
+### 4.4 Boundary and Exception Testing
+- Empty input, invalid input, extreme capacity, network/disk fault injection
+
+### 4.5 Performance Benchmarking (optional)
+- Throughput, latency, CPU/Memory profile; comparison with baseline
+
+---
+
+## Notes
+
+- **Do not** include project management info (estimates, schedules, milestones, Gantt charts)
+- Code examples must follow team C++ coding standards (see `.claude/skills/project-knowledge/`)
+- Test case naming: `__` for CI coverage tracking
diff --git a/.claude/skills/e2e/SKILL.md b/.claude/skills/e2e/SKILL.md
new file mode 100644
index 0000000000..92a8c4df6b
--- /dev/null
+++ b/.claude/skills/e2e/SKILL.md
@@ -0,0 +1,209 @@
+---
+name: e2e
+description: LoongCollector E2E 测试全流程指南:设计、编写、运行和调试。当需要编写新 E2E 测试、运行现有测试、或排查 E2E 测试失败时使用此 skill。
+---
+# LoongCollector E2E 测试指南
+
+> 详细步骤模板见 [reference.md](reference.md) | 可复用脚本见 [scripts/](scripts/)
+
+## 目录
+
+1. [概览](#1-概览)
+2. [设计测试用例](#2-设计测试用例)
+3. [编写测试用例](#3-编写测试用例)
+4. [本地运行(docker-compose)](#4-本地运行)
+5. [调试](#5-调试)
+6. [已知陷阱](#6-已知陷阱)
+
+---
+
+## 1 概览
+
+基于 **BDD Godog** 框架,通过 `.feature` 文件描述场景,引擎正则匹配步骤函数并传参。
+
+```
+test/e2e/
+ test_cases//
+ case.feature # 场景描述
+ docker-compose.yaml # 可选,外部依赖服务
+ engine/
+ steps.go # 所有可用步骤(权威来源)
+ setup/ control/ trigger/ verify/ cleanup/
+```
+
+**环境 tag**:`@host`、`@k8s`、`@docker-compose`(三选一,加 `@e2e`)
+
+---
+
+## 2 设计测试用例
+
+编写 feature 文件前,先确定测试矩阵。按以下维度逐项评估是否需要覆盖:
+
+### 2.1 场景维度清单
+
+| 维度 | 典型场景 | 何时需要 |
+|------|----------|----------|
+| **基础功能** | 单配置、单数据类型端到端 | 必须 |
+| **多数据类型** | logs / metrics / traces 分别验证 | 插件支持多类型时 |
+| **多配置共存** | 同时加载多个 pipeline 配置 | 涉及端口/资源竞争时 |
+| **配置热加载** | 运行中增/删/改配置 | 持续运行的 input 插件 |
+| **配置类型变更** | 从 A 类型切换到 B 类型 | 插件支持多协议/格式时 |
+| **反压与恢复** | 下游不可达 → 恢复后数据不丢 | flusher 插件 |
+| **外部依赖失效** | 依赖服务重启/不可达 | 有外部依赖时 |
+| **大数据量** | 高吞吐压力下不 OOM/不丢数据 | 性能敏感路径 |
+
+### 2.2 设计产出
+
+确定要覆盖的场景后,明确每个 Scenario 的:
+- **输入**:什么数据、什么格式、多少条
+- **流经路径**:input → processor → flusher 的具体插件
+- **预期输出**:在哪里验证、验证什么
+- **外部依赖**:需要什么辅助服务(OTel Collector、Kafka 等)
+
+---
+
+## 3 编写测试用例
+
+### 3.1 目录结构
+
+```
+test/e2e/test_cases/my_feature/
+├── case.feature
+├── docker-compose.yaml # 外部依赖
+└── otel-collector-config.yaml # 如果用 OTel Collector
+```
+
+### 3.2 Feature 文件模板
+
+```gherkin
+@flusher
+Feature: my feature name
+ Brief description
+
+ @e2e @docker-compose
+ Scenario: TestMyFeatureLogs
+ Given {docker-compose} environment
+ Given {my-config} local config as below
+ """
+ enable: true
+ inputs:
+ - Type: input_forward
+ Protocol: OTLP
+ Endpoint: "0.0.0.0:4320"
+ flushers:
+ - Type: flusher_otlp_native
+ Endpoint: "otel-collector:4317"
+ """
+ When start docker-compose {my_feature}
+ Then wait {10} seconds
+ When generate {1} OTLP {logs} via otelgen to endpoint {loongcollectorC:4320}, protocol {grpc}
+ Then wait {5} seconds
+ Then otlp collector received at least {1} logs from file {/tmp/otel-export/logs.json}
+```
+
+### 3.3 强制规则
+
+- 配置中必须含 `enable: true`
+- **只使用** `test/engine/steps.go` 中已注册的步骤
+- `wait {N} seconds` 是 **Then** 类型,不是 When
+- 命名格式:`Test${功能名}${场景描述}`
+- **不要**在持续运行插件的配置中使用 `global.ExcutionTimeout`(见 §6.1)
+
+### 3.4 扩展步骤
+
+如需新步骤,参考 [reference.md §扩展步骤](reference.md) 中的开发和注册流程。
+
+---
+
+## 4 本地运行
+
+### 4.1 前置条件
+
+```bash
+docker --version && docker compose version
+```
+
+如修改了 C++ 代码,需重新编译并更新镜像。两种方式:
+
+**方式一:完整构建**(慢,但保证一致)
+```bash
+make e2e_image # 从源码构建完整 Docker 镜像 aliyun/loongcollector:0.0.1
+```
+
+**方式二:增量更新**(快,适合迭代调试)
+```bash
+cd build && make -sj$(nproc) && cd ..
+# 替换镜像中的二进制
+docker create --name tmp-lc aliyun/loongcollector:0.0.1
+docker cp build/loongcollector tmp-lc:/usr/local/loongcollector/loongcollector
+docker commit tmp-lc aliyun/loongcollector:0.0.1
+docker rm tmp-lc
+```
+
+### 4.2 运行
+
+```bash
+cd test/e2e
+
+# 运行整个测试用例(所有 Scenario)
+TEST_CASE=flusher_otlp_native go test -v -run "TestE2EOnDockerCompose$" \
+ -timeout 600s -count=1 ./...
+
+# 只运行指定 Scenario
+TEST_CASE=flusher_otlp_native go test -v \
+ -run "TestE2EOnDockerCompose/TestFlusherOTLPNativeLogs$" \
+ -timeout 600s -count=1 ./...
+```
+
+### 4.3 清理(测试失败后必做)
+
+可以直接运行脚本 `bash .cursor/skills/e2e/scripts/e2e-cleanup.sh`,或手动执行:
+
+```bash
+docker rm -f $(docker ps -aq) 2>/dev/null
+docker network prune -f
+rm -rf test/e2e/config test/e2e/onetime_pipeline_config
+sudo rm -rf test/e2e/report
+rm -f test/e2e/test_cases//testcase-compose.yaml
+```
+
+---
+
+## 5 调试
+
+```bash
+# 1. 查看容器日志
+docker ps | grep loongcollectorC
+docker exec cat /usr/local/loongcollector/log/loongcollector.LOG
+
+# 2. 检查配置是否加载
+docker exec ls /usr/local/loongcollector/conf/continuous_pipeline_config/local/
+
+# 3. 检查端口是否监听
+docker exec ss -tlnp | grep
+
+# 4. 手动复现 compose 环境
+cd test/e2e/test_cases/
+docker compose -f testcase-compose.yaml up -d
+docker compose -f testcase-compose.yaml logs -f loongcollectorC
+```
+
+---
+
+## 6 已知陷阱
+
+### 6.1 ExcutionTimeout 使配置变为一次性
+
+**绝对不要**在 `input_forward`、`input_file` 等持续插件的配置中使用 `global.ExcutionTimeout`。
+
+它会使 `IsOnetime()` 返回 true,导致 `IsValidNativeInputPlugin(name, true)` 在 onetime 注册表中查找,而大部分 input 只注册了 continuous,结果报 `unsupported input plugin`。
+
+详见 `.cursor/rules/project-knowledge/config-pitfalls.mdc`。
+
+### 6.2 FlusherFile 必须是文件
+
+e2e 模板将 `report/default_flusher.json` bind-mount 到容器。若宿主机路径不存在,Docker 会创建为**目录**。已在 `BootController.Start()` 中自动处理。
+
+### 6.3 测试间残留
+
+多 Scenario 共享进程,`Clean()` 会删除 config/report。异常退出后手动清理(§4.3)。
diff --git a/.claude/skills/e2e/reference.md b/.claude/skills/e2e/reference.md
new file mode 100644
index 0000000000..d1c662ae35
--- /dev/null
+++ b/.claude/skills/e2e/reference.md
@@ -0,0 +1,134 @@
+# E2E 测试详细参考
+
+## 可用步骤速查
+
+> 权威来源:`test/engine/steps.go`
+
+### Given(环境准备)
+
+| 步骤模板 | 说明 |
+|----------|------|
+| `{docker-compose} environment` | 初始化 docker-compose 环境 |
+| `{host} environment` | 初始化主机环境 |
+| `{daemonset} environment` | 初始化 K8s 环境 |
+| `{name} local config as below` | 写入持续采集配置 |
+| `{name} onetime pipeline local config as below` | 写入一次性采集配置 |
+| `subcribe data from {sls} with config` | 订阅 SLS 数据源 |
+| `loongcollector depends on containers {name}` | 设置容器依赖 |
+| `loongcollector container mount {src} to {dst}` | 挂载卷 |
+| `loongcollector expose port {host} to {container}` | 暴露端口 |
+| `docker-compose boot type {type}` | 设置 boot 类型 |
+| `mkdir {path}` | 创建目录 |
+
+### When(触发动作)
+
+| 步骤模板 | 说明 |
+|----------|------|
+| `start docker-compose {case_name}` | 启动 docker-compose 环境 |
+| `begin trigger` | 标记触发开始时间(生成日志前必须调用) |
+| `generate {N} regex logs to file {path}, with interval {M}ms` | 生成正则日志 |
+| `generate {N} json logs to file {path}, with interval {M}ms` | 生成 JSON 日志 |
+| `generate {N} apsara logs to file {path}, with interval {M}ms` | 生成 Apsara 日志 |
+| `generate {N} OTLP {logs\|metrics\|traces} via otelgen to endpoint {ep}, protocol {grpc\|http}` | 生成 OTLP 数据 |
+| `generate {N} http logs, with interval {M}ms, url: {url}, method: {method}, body:` | 生成 HTTP 日志 |
+| `execute {N} commands {cmd} in sequence` | 顺序执行命令 |
+| `execute {N} commands {cmd} in parallel` | 并行执行命令 |
+| `create the shell script file {name} with the following content` | 创建 shell 脚本 |
+| `execute {N} the shell script file {name} in parallel` | 并行执行 shell 脚本 |
+| `restart agent` | 重启 Agent |
+| `force restart agent` | 强制重启 Agent |
+
+### Then(结果验证)
+
+| 步骤模板 | 说明 |
+|----------|------|
+| `there is {N} logs` | 精确验证日志数(上限 100) |
+| `there is at least {N} logs` | 最少日志数验证 |
+| `there is less than {N} logs` | 最多日志数验证 |
+| `the log fields match kv` | KV 字段匹配(文档内容跟 `"""..."""`) |
+| `the log fields match as below` | 日志字段模式匹配 |
+| `the log tags match kv` | Tag KV 匹配 |
+| `the log is in order` | 日志顺序验证 |
+| `wait {N} seconds` | 等待 N 秒 |
+| `otlp collector received at least {N} (logs\|metrics\|traces) from file {path}` | OTel Collector 数据验证 |
+
+> 注意:日志数量验证上限 100。超过 100 用 `When query through` + `Then the log fields match kv` 方式。
+
+---
+
+## 扩展步骤
+
+### 1. 编写函数
+
+在 `test/engine/` 对应子目录下:
+
+```go
+func MyVerification(ctx context.Context, expected int) (context.Context, error) {
+ // 实现逻辑
+ return ctx, nil
+}
+```
+
+签名要求:第一个参数 `context.Context`,返回 `(context.Context, error)`。
+
+### 2. 注册
+
+在 `test/engine/steps.go` 中:
+
+```go
+ctx.Then(`^my verification expects \{(\d+)\}$`, verify.MyVerification)
+```
+
+### 3. 使用
+
+```gherkin
+Then my verification expects {42}
+```
+
+---
+
+## docker-compose.yaml 示例
+
+### OTel Collector(OTLP 测试用)
+
+```yaml
+services:
+ otel-collector:
+ image: otel/opentelemetry-collector-contrib:latest
+ hostname: otel-collector
+ user: "0:0"
+ ports:
+ - "4317"
+ volumes:
+ - ./otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml
+ - ./otel-export:/tmp/otel-export
+ healthcheck:
+ test: ["CMD", "wget", "--spider", "-q", "http://localhost:13133/"]
+ interval: 5s
+ timeout: 3s
+ retries: 5
+ start_period: 10s
+```
+
+---
+
+## eBPF 进程安全测试示例
+
+```gherkin
+@e2e @host @ebpf_input
+Scenario: TestEBPFProcessSecurityByNormalStart
+ Given {host} environment
+ Given subcribe data from {sls} with config
+ """
+ """
+ Given {ebpf_process_security_default} local config as below
+ """
+ enable: true
+ inputs:
+ - Type: input_process_security
+ """
+ When begin trigger
+ When execute {1} commands {/bin/echo 1} in sequence
+ When query through {* | select * from e2e where call_name = 'execve' and binary = '/bin/echo' and arguments = '1'}
+ Then there is {1} logs
+```
diff --git a/.claude/skills/e2e/scripts/e2e-cleanup.sh b/.claude/skills/e2e/scripts/e2e-cleanup.sh
new file mode 100755
index 0000000000..ceba2870be
--- /dev/null
+++ b/.claude/skills/e2e/scripts/e2e-cleanup.sh
@@ -0,0 +1,32 @@
+#!/usr/bin/env bash
+# E2E 测试环境清理脚本
+# 用法: bash .claude/skills/e2e/scripts/e2e-cleanup.sh [case_name]
+set -euo pipefail
+
+REPO_ROOT="$(git rev-parse --show-toplevel)"
+E2E_DIR="$REPO_ROOT/test/e2e"
+CASE_NAME="${1:-}"
+
+echo "==> 停止并删除所有 Docker 容器..."
+docker rm -f $(docker ps -aq) 2>/dev/null || true
+
+echo "==> 清理 Docker 网络..."
+docker network prune -f 2>/dev/null || true
+
+echo "==> 清理运行时目录..."
+rm -rf "$E2E_DIR/config" "$E2E_DIR/onetime_pipeline_config"
+sudo rm -rf "$E2E_DIR/report" 2>/dev/null || rm -rf "$E2E_DIR/report" 2>/dev/null || true
+
+if [[ -n "$CASE_NAME" ]]; then
+ CASE_DIR="$E2E_DIR/test_cases/$CASE_NAME"
+ if [[ -d "$CASE_DIR" ]]; then
+ echo "==> 清理测试用例 $CASE_NAME..."
+ rm -f "$CASE_DIR/testcase-compose.yaml"
+ rm -f "$CASE_DIR/otel-export/"*.json 2>/dev/null || true
+ fi
+else
+ echo "==> 清理所有测试用例的 testcase-compose.yaml..."
+ find "$E2E_DIR/test_cases" -name "testcase-compose.yaml" -delete 2>/dev/null || true
+fi
+
+echo "==> 清理完成"
diff --git a/.claude/skills/mermaid/SKILL.md b/.claude/skills/mermaid/SKILL.md
new file mode 100644
index 0000000000..3cccf610d8
--- /dev/null
+++ b/.claude/skills/mermaid/SKILL.md
@@ -0,0 +1,42 @@
+---
+name: mermaid
+description: Mermaid diagram conventions. Use whenever diagrams are needed in documentation or code review.
+---
+# Mermaid Diagram Conventions
+
+## Rules for Creating Mermaid Diagrams
+
+1. **Use Correct Fenced Code Block**: Always use ````mermaid ... ````
+
+2. **Stick to Well-Supported Diagram Types**:
+ - `graph` (flowcharts, `TD` preferred for readability)
+ - `sequenceDiagram`
+ - `classDiagram`
+ - `stateDiagram-v2` (prefer v2)
+ - `erDiagram`
+ - `pie`, `gantt`, `mindmap` (basic only)
+ - Avoid very new or uncommon types
+
+3. **Simple Standard Syntax**:
+ - **Node IDs**: Use simple alphanumeric IDs (`node1`, `processA`). No spaces or special chars.
+ - **Labels**: **Use quotes** for labels with spaces/punctuation/keywords.
+ - Good: `A["User Input"] --> B["Validate Data"];`
+ - Bad: `A[User Input] --> B[Validate Data];`
+ - Use standard arrows (`-->`, `---`, `==>`)
+ - Comments: `%%`
+
+4. **Mindmap (GitHub compatible)**:
+ - Use basic indentation structure only
+ - NO `::icon()` syntax (causes rendering errors)
+ - Each node on its own line with correct indentation
+
+5. **Prefer Vertical Layouts**: `graph TD` or `graph TB` for flowcharts (easier to read in Markdown)
+
+6. **Let GitHub Handle Styling**:
+ - DO NOT set themes (`%%{init: ...}`)
+ - DO NOT use `classDef` or `style`
+ - GitHub auto-adapts to light/dark mode
+
+7. **Keep Diagrams Focused**: Break complex diagrams into multiple simpler ones
+
+8. **Always Review Automated Edits**: Tools may break Mermaid syntax, especially with indentation-heavy formats like mindmap
diff --git a/.claude/skills/omc-reference/SKILL.md b/.claude/skills/omc-reference/SKILL.md
new file mode 100644
index 0000000000..cc02915c07
--- /dev/null
+++ b/.claude/skills/omc-reference/SKILL.md
@@ -0,0 +1,141 @@
+---
+name: omc-reference
+description: OMC agent catalog, available tools, team pipeline routing, commit protocol, and skills registry. Auto-loads when delegating to agents, using OMC tools, orchestrating teams, making commits, or invoking skills.
+user-invocable: false
+---
+
+# OMC Reference
+
+Use this built-in reference when you need detailed OMC catalog information that does not need to live in every `CLAUDE.md` session.
+
+## Agent Catalog
+
+Prefix: `oh-my-claudecode:`. See `agents/*.md` for full prompts.
+
+- `explore` (haiku) — fast codebase search and mapping
+- `analyst` (opus) — requirements clarity and hidden constraints
+- `planner` (opus) — sequencing and execution plans
+- `architect` (opus) — system design, boundaries, and long-horizon tradeoffs
+- `debugger` (sonnet) — root-cause analysis and failure diagnosis
+- `executor` (sonnet) — implementation and refactoring
+- `verifier` (sonnet) — completion evidence and validation
+- `tracer` (sonnet) — trace gathering and evidence capture
+- `security-reviewer` (sonnet) — trust boundaries and vulnerabilities
+- `code-reviewer` (opus) — comprehensive code review
+- `test-engineer` (sonnet) — testing strategy and regression coverage
+- `designer` (sonnet) — UX and interaction design
+- `writer` (haiku) — documentation and concise content work
+- `qa-tester` (sonnet) — runtime/manual validation
+- `scientist` (sonnet) — data analysis and statistical reasoning
+- `document-specialist` (sonnet) — SDK/API/framework documentation lookup
+- `git-master` (sonnet) — commit strategy and history hygiene
+- `code-simplifier` (opus) — behavior-preserving simplification
+- `critic` (opus) — plan/design challenge and review
+
+## Model Routing
+
+- `haiku` — quick lookups, lightweight inspection, narrow docs work
+- `sonnet` — standard implementation, debugging, and review
+- `opus` — architecture, deep analysis, consensus planning, and high-risk review
+
+## Tools Reference
+
+### External AI / orchestration
+- `/team N:executor "task"`
+- `omc team N:codex|gemini "..."`
+- `omc ask `
+- `/ccg`
+
+### OMC state
+- `state_read`, `state_write`, `state_clear`, `state_list_active`, `state_get_status`
+
+### Team runtime
+- `TeamCreate`, `TeamDelete`, `SendMessage`, `TaskCreate`, `TaskList`, `TaskGet`, `TaskUpdate`
+
+### Notepad
+- `notepad_read`, `notepad_write_priority`, `notepad_write_working`, `notepad_write_manual`
+
+### Project memory
+- `project_memory_read`, `project_memory_write`, `project_memory_add_note`, `project_memory_add_directive`
+
+### Code intelligence
+- LSP: `lsp_hover`, `lsp_goto_definition`, `lsp_find_references`, `lsp_diagnostics`, and related helpers
+- AST: `ast_grep_search`, `ast_grep_replace`
+- Utility: `python_repl`
+
+## Skills Registry
+
+Invoke built-in workflows via `/oh-my-claudecode:`.
+
+### Workflow skills
+- `autopilot` — full autonomous execution from idea to working code
+- `ralph` — persistence loop until completion with verification
+- `ultrawork` — high-throughput parallel execution
+- `visual-verdict` — structured visual QA verdicts
+- `team` — coordinated team orchestration
+- `ccg` — Codex + Gemini + Claude synthesis lane
+- `ultraqa` — QA cycle: test, verify, fix, repeat
+- `omc-plan` — planning workflow and `/plan`-safe alias
+- `ralplan` — consensus planning workflow
+- `sciomc` — science/research workflow
+- `external-context` — external docs/research workflow
+- `deepinit` — hierarchical AGENTS.md generation
+- `deep-interview` — Socratic ambiguity-gated requirements workflow
+- `ai-slop-cleaner` — regression-safe cleanup workflow
+
+### Utility skills
+- `ask`, `cancel`, `note`, `learner`, `omc-setup`, `mcp-setup`, `hud`, `omc-doctor`, `trace`, `release`, `project-session-manager`, `skill`, `writer-memory`, `configure-notifications`
+
+### Keyword triggers kept compact in CLAUDE.md
+- `"autopilot"→autopilot`
+- `"ralph"→ralph`
+- `"ulw"→ultrawork`
+- `"ccg"→ccg`
+- `"ralplan"→ralplan`
+- `"deep interview"→deep-interview`
+- `"deslop" / "anti-slop"→ai-slop-cleaner`
+- `"deep-analyze"→analysis mode`
+- `"tdd"→TDD mode`
+- `"deepsearch"→codebase search`
+- `"ultrathink"→deep reasoning`
+- `"cancelomc"→cancel`
+- Team orchestration is explicit via `/team`.
+
+## Team Pipeline
+
+Stages: `team-plan` → `team-prd` → `team-exec` → `team-verify` → `team-fix` (loop).
+
+- Use `team-fix` for bounded remediation loops.
+- `team ralph` links the team pipeline with Ralph-style sequential verification.
+- Prefer team mode when independent parallel lanes justify the coordination overhead.
+
+## Commit Protocol
+
+Use git trailers to preserve decision context in every commit message.
+
+### Format
+- Intent line first: why the change was made
+- Optional body with context and rationale
+- Structured trailers when applicable
+
+### Common trailers
+- `Constraint:` active constraint shaping the decision
+- `Rejected:` alternative considered | reason for rejection
+- `Directive:` forward-looking warning or instruction
+- `Confidence:` `high` | `medium` | `low`
+- `Scope-risk:` `narrow` | `moderate` | `broad`
+- `Not-tested:` known verification gap
+
+### Example
+```text
+feat(docs): reduce always-loaded OMC instruction footprint
+
+Move reference-only orchestration content into a native Claude skill so
+session-start guidance stays small while detailed OMC reference remains available.
+
+Constraint: Preserve CLAUDE.md marker-based installation flow
+Rejected: Sync all built-in skills in legacy install | broader behavior change than issue requires
+Confidence: high
+Scope-risk: narrow
+Not-tested: End-to-end plugin marketplace install in a fresh Claude profile
+```
diff --git a/.claude/skills/project-knowledge/SKILL.md b/.claude/skills/project-knowledge/SKILL.md
new file mode 100644
index 0000000000..44127422e6
--- /dev/null
+++ b/.claude/skills/project-knowledge/SKILL.md
@@ -0,0 +1,220 @@
+---
+name: project-knowledge
+description: LoongCollector project knowledge: architecture, terminology, codebase map, and coding standards (C++/Go).
+---
+# LoongCollector Project Knowledge
+
+## Architecture Overview
+
+The LoongCollector architecture is based on a plugin system with the following key components:
+
+1. **Core Application**: Main entry point in `core/logtail.cpp`, initializes `Application` class in `core/application/Application.cpp`. Follows singleton pattern, manages overall lifecycle.
+
+2. **Plugin System**: Supports plugins for data collection, processing, and flushing:
+ - **Inputs**: Collect data from various sources (files, network, system metrics, etc.)
+ - **Processors**: Transform and process collected data
+ - **Flushers**: Send processed data to various backends
+
+3. **Pipeline Management**: Collection pipelines managed by `CollectionPipelineManager` handle data flow from inputs through processors to flushers.
+
+4. **Configuration**: Supports both local and remote configuration management with watchers that monitor for configuration changes.
+
+5. **Queuing System**: Implements various queue types including bounded queues, circular queues, and exactly-once delivery queues for reliable data transmission.
+
+6. **Monitoring**: Built-in monitoring and metrics collection for tracking the collector's own performance and health.
+
+## Project Structure
+
+```
+core/ # Core C++ code
+ plugin/ # Plugin system
+ input/ # Data collection input plugins
+ processor/ # Data processing plugins
+ flusher/ # Data output plugins (SLS, file, etc.)
+ collection_pipeline/ # Main pipeline flow (queue, batch, serialization)
+ config/ # Configuration (loading, providers, feedback)
+ provider/ # Config providers (Enterprise, Legacy)
+ common/ # Common utilities, data structures, network, string, crypto
+ monitor/ # Monitoring, metrics collection, alerting
+ logger/ # Logging system
+ checkpoint/ # Checkpoint, state management
+ app_config/ # Global configuration
+ models/ # Core data structures (events, logs, metrics)
+ parser/ # Log parsers
+ task_pipeline/ # Task scheduling
+ go_pipeline/ # Go plugin integration
+ ebpf/ # eBPF collection and plugins
+ host_monitor/ # Host-level monitoring
+ shennong/ # Shennong metrics
+ prometheus/ # Prometheus collection
+ file_server/ # File collection and management
+ container_manager/ # Container environment management
+ application/ # Main application entry
+ protobuf/ # Protobuf protocol definitions
+ metadata/ # K8s and other metadata collection
+ constants/ # Constants
+ tools/ # Internal utility scripts
+ unittest/ # Unit tests
+ legacy_test/ # Historical test cases
+
+pkg/ # Go packages
+ helper/ # Go helper functions
+ containercenter/ # Go container-related functions
+
+plugin_main/ # Plugin main entry
+pluginmanager/ # Go plugin manager (lifecycle, registration)
+plugins/ # Go plugin packages
+ input/ # Go input plugins (docker, etc.)
+ processor/ # Go processor plugins
+ flusher/ # Go flusher plugins
+ aggregator/ # Go aggregator plugins
+ extension/ # Go extension plugins
+ all/ # Plugin registration and init
+ test/ # Go plugin tests
+
+test/ # Integration tests
+e2e/ # E2E test cases (open source Go plugins)
+e2e_enterprise/ # E2E enterprise test cases (host + K8s)
+docs/ # Project documentation
+scripts/ # Build, deploy, test scripts
+docker/ # Docker-related files
+rpm/ # RPM packaging
+external/ # External dependencies
+```
+
+## Key Dependencies
+
+### Header-Only Libraries
+- `spdlog` - Logging
+- `rapidjson` - JSON parsing
+
+### Compiled Libraries
+- **Testing**: `gtest`, `gmock`
+- **Serialization**: `protobuf`
+- **Regex**: `re2`
+- **Hash**: `cityhash`
+- **Config**: `jsoncpp`, `yamlcpp`
+- **Compression**: `lz4`, `zlib`, `zstd`
+- **Network**: `curl`, `ssl`, `crypto`
+- **System**: `boost`, `gflags`, `leveldb`, `uuid`
+- **Memory**: `tcmalloc` (optional)
+
+### Tech Stack
+- C++ (main implementation, C++17/20)
+- Protobuf (data serialization)
+- eBPF (kernel-level data collection)
+- Prometheus (metrics collection)
+- Go (plugin adaptation)
+- Shell/Python (build and test scripts)
+
+## Terminology Glossary
+
+| Term | Description |
+|------|-------------|
+| LoongCollector | The observability data collection agent (formerly iLogtail) |
+| Pipeline | A data processing chain: Input -> Processor(s) -> Flusher |
+| Plugin | A modular component that performs specific data operations |
+| Input Plugin | Collects data from a source (file, network, metric, etc.) |
+| Processor Plugin | Transforms data (parse, filter, enrich, etc.) |
+| Flusher Plugin | Sends data to a destination (SLS, stdout, Prometheus, etc.) |
+| Config | Collection configuration defining pipeline behavior |
+| Checkpoint | Persistent state tracking for exactly-once delivery |
+| Runner | Execution wrapper for a specific plugin instance |
+| Queue | Data buffer between pipeline stages |
+| Batch | Group of events processed/sent together |
+| SLS | Alibaba Cloud Simple Log Service |
+| eBPF | Extended Berkeley Packet Filter (kernel tracing) |
+| SPL | Structured Processing Language |
+
+## Codebase Map
+
+### Key Entry Points and Core Flows
+
+| Path | Purpose |
+|------|---------|
+| `core/logtail.cpp` | Main entry point |
+| `core/application/Application.cpp` | Application singleton, lifecycle management |
+| `core/collection_pipeline/CollectionPipelineManager.cpp` | Pipeline lifecycle |
+| `core/collection_pipeline/CollectionPipeline.cpp` | Pipeline execution |
+| `core/runner/ProcessorRunner.cpp` | Processor execution |
+| `core/runner/FlusherRunner.cpp` | Flusher execution |
+| `core/config/watcher/PipelineConfigWatcher.cpp` | Config change detection |
+| `core/file_server/FileServer.cpp` | File collection management |
+| `core/file_server/checkpoint/CheckpointManagerV2.cpp` | Exactly-once checkpoint |
+
+### Invariant Rules
+
+- **Lifecycle**: All plugins follow Init -> Start -> Stop -> Close lifecycle
+- **Resource Release**: Every thread/future/queue must be properly cleaned up on stop
+- **Config**: Environment variables are case-insensitive with default fallbacks
+- **Queue**: Bounded queue with backpressure; pop on disabled queue should not hang
+- **Hot Reload**: After config change, system must return to consistent "collect+process+send" state
+
+### Common Patterns
+
+- RAII for resource management
+- Smart pointers over raw pointers
+- Singleton pattern for managers (Application, AlarmManager, WriteMetrics)
+- Thread-safe queues with condition variables
+- Plugin registration via static initialization
+
+## C++ Coding Standards
+
+### Naming
+- **PascalCase** for class names, global functions, public methods
+- **camelCase** for variable names and private methods
+- **SCREAMING_SNAKE_CASE** for macros and constants
+- **m** prefix for member variables (e.g., `mUserId`)
+- **k** prefix for constants (e.g., `kMaxSendBufferSize`)
+
+### Modern C++
+- Prefer C++17/20 features (auto, range-based loops, smart pointers)
+- Use `std::unique_ptr` / `std::shared_ptr` for memory management
+- Prefer `std::optional`, `std::variant`, `std::any` for type-safe alternatives
+- Use `constexpr` and `const` for compile-time computations
+- Use `std::string_view` for read-only string operations
+
+### Error Handling
+- Use exceptions for error handling (`std::runtime_error`, `std::invalid_argument`)
+- RAII for resource management to avoid memory leaks
+- Validate inputs at function boundaries
+- Log errors using spdlog
+
+### Performance
+- Avoid unnecessary heap allocations; prefer stack-based objects
+- Use `std::move` for move semantics
+- Optimize loops with `` (e.g., `std::sort`, `std::for_each`)
+- Use `std::array` or `std::vector` over raw arrays
+
+### Security
+- Avoid C-style casts; use `static_cast`, `dynamic_cast`, `reinterpret_cast`
+- Enforce const-correctness
+- Avoid global variables; use singletons sparingly
+- Use `enum class` for strongly typed enumerations
+
+### Testing
+- Unit tests using Google Test (GTest) / Google Mock
+- Integration tests for system components
+
+## Go Coding Standards
+
+### Naming
+- **PascalCase** for exported types and functions
+- **camelCase** for unexported types and functions
+- **snake_case** for variables and constants
+- Package names use lowercase
+
+### Error Handling
+- Return errors explicitly, do not panic
+- Use `fmt.Errorf` with `%w` for error wrapping
+- Check errors at every call site
+
+### Concurrency
+- Use goroutines for concurrent operations
+- Use channels or sync primitives for communication
+- Avoid goroutine leaks; always provide exit paths
+
+### Testing
+- Use standard `testing` package
+- Table-driven tests for function coverage
+- Integration tests via E2E framework
diff --git a/.claude/skills/review-standards/SKILL.md b/.claude/skills/review-standards/SKILL.md
new file mode 100644
index 0000000000..3fafef5dd8
--- /dev/null
+++ b/.claude/skills/review-standards/SKILL.md
@@ -0,0 +1,255 @@
+---
+name: review-standards
+description: Code review behavioral standards. Reference during code review to ensure consistent quality checks from a QA perspective.
+---
+# Code Review Rule
+
+你是一个高级代码审查助手,审查代码时要从QA角度仔细检查问题,以批判的眼光看待代码,以发现潜在问题为目的。
+
+为了避免得到假阳性的检查结果,请注意:
+
+* 分析具体代码片段时要包含足够上下文,不要仅基于局部信息做出判断。
+
+* 避免基于记忆进行代码分析,必须基于实际查看的代码。
+
+* 在指出问题前,先理解业务逻辑的完整流程,考虑代码设计的合理性和必要性。
+
+
+请按下面的步骤进行Code Review
+
+## 1. 获取评审内容,无需输出
+
+用户会提供分支或PR信息,请根据以下指示获取评审文件列表和内容
+
+1. 如果提供两个分支名称(例如 "fork/feature" 和 "main")。获取评审文件列表和内容的方法是:
+
+ * 运行 `git branch` 和 `git remote` 了解分支是origin分支还是其他远程分支。
+
+ * 需要使用 `git fetch` 检出分支(如 `fork/feature`、`origin/main`),确保获取最新代码。
+
+ * 运行 `git checkout fork/feature && git pull` 将内容拉取到本地,以便review时查询完整上下文。
+
+ * 执行 `git diff --name-only --diff-filter=M origin/main...fork/feature` 来列出被修改的文件。
+
+ * 对于上述列表中的每个文件,运行 `git diff --quiet origin/main...fork/feature -- `获取变更内容。
+
+2. 如果仅提供一个分支名称(例如 "fork/feature"),那么另一分支名称就是"main",然后和提供两个分支名称一样处理。
+
+3. 如果提供的是一个PR号,那么两个分支分别为 "origin/pull/{PR号}/head" 和 "main",然后和提供两个分支名称一样处理。
+
+
+## 2. 高层次摘要,需要输出
+
+对评审内容用 2–3 句话概括描述:
+
+* **产品影响**:这项变更对用户或客户带来了什么价值?
+
+* **工程实现方式**:使用了哪些关键数据结构、算法、模式、框架或最佳实践?
+
+## 3. PR代码理解
+
+请以代码作者视角向Reviewer解释当前PR想干什么。必要时使用mermaid画出关键逻辑、数据结构和交互时序图。
+首先,从全局视角梳理这个PR涉及到数据采集、处理、发送的整体流程(不涉及的部分无需说明),关键组件数据流怎么串联的,用的什么数据结构。
+然后,说明这个PR想扩展什么,应该怎么扩展。
+最后,详解PR实际怎么做的,包括解析、错误处理、重试等关键逻辑。
+
+## 4. 牢记评估标准,无需输出
+
+针对每个有变更的文件及其差异块,评估这些行是否符合以下方面的要求:
+
+1. **业务逻辑深度理解**
+
+ * 分析组件的实际作用和预期行为
+
+ * 识别可能导致功能失效的边缘情况
+
+ * 质疑现有的设计是否满足业务目标
+
+ * 考虑故障模式和容错机制
+
+2. **设计与架构**
+
+ * 模块职责:确保单一职责原则,检查设计是否符合 SOLID 原则,将可测试性作为重要标准
+
+ * 依赖管理:识别组件间的调用链和依赖关系,检查是否存在循环依赖或隐含依赖
+
+ * 分析故障传播路径,确保故障的上下文信息正确
+
+ * Input和Flusher采用总线Runner模式,配置通过注册应用,线程数不随配置数量增加
+
+ * 自监控涉及重启的功能应该由LogtailMonitor统一管理
+
+3. **正确性与安全**
+
+ * 边界检查:数组/容器访问前验证索引,如`if (index < container.size())`
+
+ * 空指针防护:公共方法必须检查指针参数,如`if (!ptr) return false;`
+
+ * 类型安全:JSON解析先验证类型,如`if (json.isString()) value = json.asString();`
+
+ * 资源管理:使用RAII和智能指针,避免内存泄漏,如`std::unique_ptr`、`std::shared_ptr`。优先使用现成的RAII封装,如需自定义清理逻辑可使用unique\_ptr + lambda构建。
+
+ * 错误处理:外部输入防御式编程,包括读配置(如`std::ios_base::failure`、`std::filesystem::filesystem_error`、`boost::regex_error`)、文件、数据库、网络,必须有异常处理和完备日志
+
+ * 错误传播:检查错误是否正确传播到上层,避免静默失败
+
+ * 外部接口调用容错:对外部API调用、网络请求等失败场景,必须实现指数退避重试机制,避免因瞬时故障导致外部接口过载。
+
+ * 类型转换:检查类型转换的安全性,特别是缩窄转换(narrowing conversion)
+
+4. **性能与效率**
+
+ * 内存优化:
+
+ * 容器预分配大小,如`vector.reserve(expected_size)`
+
+ * 避免不必要拷贝,优先移动语义和引用传递,如`map.emplace(args)`,`auto& val = map[key]`。
+
+ * 字符串操作优先使用 `StringView`数据结构避免复制,优先使用core/common/StringTools.h已有的工具函数如,字符串切分`StringViewSplitter`,字符串修剪`Trim`,字符串解析`StringTo`。
+
+ * 限制容器最大大小防止内存爆炸,如`if (queue.size() > MAX_QUEUE_SIZE)`
+
+ * 计算效率:
+
+ * 缓存重复计算结果,避免热点路径中的重复工作,例如通过sysconf获取的值仅需在初始化时获取一次。
+
+ * 确保已使用业界最优的数据结构和算法,尽量避免非线性性能退化
+
+ * 批处理操作减少系统、网络调用开销,如批量发送
+
+ * 热路径性能审查:
+
+ * 特别关注循环内部、事件处理循环中的性能变化
+
+ * 对比新旧实现的时间复杂度差异
+
+ * 质疑任何在高频路径中引入额外数据结构查找的变更
+
+ * 主机监控指标:添加指标应该在SystemInterface中同时添加缓存,确保同一时间点获取的指标一致。
+
+5. **并发与线程安全**
+
+ * 锁策略:最小化锁范围,优先无锁数据结构如`boost::concurrent_flat_map`
+
+ * 死锁预防:多锁时统一加锁顺序,避免嵌套锁
+
+ * 线程复用:使用线程池而非频繁创建线程
+
+ * 事件驱动:IO操作优先考虑事件驱动而非多线程
+
+ * 数据竞争:共享数据必须同步保护,原子操作优于锁
+
+ * 异步数据高效传递,例如优先使用epoll的`event.data.ptr`,curl的`CURLOPT_PRIVATE`直接携带上下文数据。
+
+ * 新增线程:应使用`std::future`、`std::mutex`、`std::condition_variable`配套模式,以便快速停止,参考core/common/timer/Timer.h。
+
+6. **动态链接库**
+
+ * 使用core/common/DynamicLibHelper.cpp中定义的工具加载动态链接库,避免直接依赖导致的兼容性问题。
+
+ * 动态链接库中的代码中不允许自己分配线程资源,必须由主程序控制。
+
+ * 动态链接库中的内存申请和释放方法必须配对,不允许跨主程序和动态链接库进行内存申请和释放。
+
+7. **可读性与规范**
+
+ * 标准:
+
+ * 复用C++17标准库,避免重复轮子
+
+ * 尽可能使用`constexpr`、`auto`、范围for循环(`for (auto& elem : container) {}`)
+
+ * 使用`std::optional`安全地表示可能为空的返回值,使用`std::variant`处理几种固定不同类型的值。
+
+ * 调用linter工具,发现违反规范的新增代码
+
+ * 命名约定:
+
+ * 类名PascalCase:`InputContainerStdio`
+
+ * 成员变量m前缀:`mProject`, `mLogstore`
+
+ * 常量变量k前缀:`kMaxSendLogGroupSize`
+
+ * 代码组织:
+
+ * 保持控制流简洁,降低圈复杂度,抽象重复逻辑(DRY原则),将密集逻辑重构为可测试的辅助方法
+
+ * 彻底移除无用或不可达代码,包括注释掉的废弃代码。
+
+ * 魔法数字抽成常量或gflag。
+
+ * 优先使用结构体数组,而不是平行的多个数组。
+
+ * 变量和方法应该声明在header文件中,实现在cpp文件,除非是模版类或者有强烈的inline需要。
+
+ * 避免全局变量,应该使用类、命名空间进行范围限定。
+
+ * 注释质量:
+
+ * 解释"为什么"而非"什么",复杂算法必须注释
+
+ * 对代码修改附近的注释检查注释是否需要同步修改
+
+ * 禁止使用不安全的C函数,例如`strcpy`, `strcat`, `strcmp`, `strlen`, `strchr`, `strrchr`, `strstr`, `sprintf`, `strtok`, `sscanf`, `strspn`, `strcspn`, `strpbrk`, `strncat`, `strncmp`, `strncpy`, `strcoll`, `strxfrm`, `strdup`, `strndup`
+
+8. **稳定性与监控**
+
+ * 容量控制:所有缓冲区/队列设置上限,如`INT32_FLAG(max_send_log_group_size)`
+
+ * 可观测性:缓存大小、延时、丢弃数等关键指标记录,异常情况使用日志记录,导致延时、丢数据的关键异常使用SendAlarm上报远程服务器。同时检查`LOG_INFO`/`LOG_WARNING`/`LOG_ERROR`日志是否有高频调用刷屏的风险。
+
+ * 自监控指标、告警:参考`../selfmonitor/SKILL.md`中的内容和规范进行检查。
+
+9. **兼容性与部署**
+
+ * 平台兼容:路径分隔符、字节序、系统调用差异处理
+
+ * 向后兼容:配置格式变更需要兼容旧版配置,新增参数应避免改变原有默认行为
+
+ * 配置默认值:新增配置项必须有合理的默认值,并在文档中说明
+
+ * 本地状态兼容:禁止使用Protobuf的`TextFormat`,避免新增参数dump后无法读取。旧版本dump的状态文件,新版本应该正常读取恢复。
+
+10. **测试与质量**
+
+ * 覆盖策略:单元测试应涵盖成功和失败路径,核心逻辑100%覆盖,边界条件必测
+
+ * 测试命名准确描述行为。
+
+ * 性能测试:对性能敏感的代码路径,应提供基准测试(benchmark)
+
+11. **安全与合规性**:
+
+ * 检查配置和输入验证与清理以防注入攻击。
+
+ * 检查新增依赖库是否必要,新增时必须将License添加到licenses目录。
+
+ * 新文件包含Copyright和Apache License声明。
+
+ * 代码中严禁出现密钥泄露。
+
+12. **文档:**
+
+ * 对于新增的input、processor、flusher插件,检查是否新建了对应的使用文档。
+
+ * 对于改写的input、processor、flusher插件,如果GetXxxParam的参数有改动,需要对应修改使用文档。
+
+
+## 5. 按评估标准报告问题,需要输出
+
+对发现的每个问题请按如下格式输出一个嵌套项:
+
+```markdown
+- 文件: [<路径>:<起始行号>](file://./<路径>#L<起始行号>)
+ - 问题: <问题本质的一句话总结>
+ - 建议: <简明的修改建议或代码示例>
+```
+
+注意在输出行号前再次检索被review代码,确保使用精确的行号,以便 IDE 可以直接跳转。
+
+## 6. 亮点总结,需要输出
+
+在报告之后,用简短的列表形式总结你在差异中观察到的正面实践或良好实现。
+
+整体过程中,请保持礼貌、专业的语气;保持评论尽可能简洁,同时不失清晰;并且确保仅分析真正发生变更的文件。
\ No newline at end of file
diff --git a/.claude/skills/riper5-protocol/SKILL.md b/.claude/skills/riper5-protocol/SKILL.md
new file mode 100644
index 0000000000..aea0984866
--- /dev/null
+++ b/.claude/skills/riper5-protocol/SKILL.md
@@ -0,0 +1,71 @@
+---
+name: riper5-protocol
+description: RIPER-5 workflow protocol for complex software engineering tasks: Research, Innovate, Plan, Execute, Review.
+---
+# RIPER-5 Protocol
+
+RIPER-5 is a 5-phase workflow designed for complex software engineering tasks: system design, architectural refactoring, bug diagnosis, performance optimization, multi-component integration.
+
+## Core Principle
+
+Start every new conversation in RESEARCH mode. Do not jump to solutions. Progress through phases only with explicit signals.
+
+## Modes
+
+### Mode 1: RESEARCH `[MODE: RESEARCH]`
+**Purpose**: Information collection and deep understanding
+**Allowed**: Read files, ask clarifying questions, analyze architecture, identify constraints, create task files
+**Forbidden**: Suggestions, implementation, planning, any solution hints
+**Output**: Start with `[MODE: RESEARCH]`, then only observations and questions.
+
+### Mode 2: INNOVATE `[MODE: INNOVATE]`
+**Purpose**: Brainstorm potential approaches
+**Allowed**: Discuss solution ideas, evaluate pros/cons, explore alternatives, document findings
+**Forbidden**: Specific planning, implementation details, writing code, committing to solutions
+**Output**: Start with `[MODE: INNOVATE]`, then only possibilities and considerations.
+
+### Mode 3: PLAN `[MODE: PLAN]`
+**Purpose**: Create exhaustive technical specification
+**Allowed**: Detailed plans with file paths, function signatures, data structure changes, error handling, dependency management, test approach
+**Forbidden**: Any implementation or code writing, even "example code" that could be executed
+**Required**: Convert entire plan into a numbered sequential checklist
+**Output**: Start with `[MODE: PLAN]`, then only specifications and implementation details.
+
+### Mode 4: EXECUTE `[MODE: EXECUTE]`
+**Purpose**: Implement exactly what was planned in Mode 3
+**Allowed**: Only implement what the approved plan explicitly details, follow checklist exactly, mark completed items, update task progress
+**Forbidden**: Any deviation from plan, un-planned improvements, creative additions
+**Quality**: Always show full code context, specify language and path, proper error handling
+**Deviation**: If any deviation needed, immediately return to PLAN mode
+**Entry**: Only enter on explicit "ENTER EXECUTE MODE" command
+
+### Mode 5: REVIEW `[MODE: REVIEW]`
+**Purpose**: Ruthlessly verify implementation matches plan
+**Required**: Line-by-line comparison, technical verification, check for bugs/unexpected behavior, verify against original requirements
+**Report**: Must state if implementation matches plan exactly or deviates
+**Format**: `Detected deviation: [exact description]` or `Implementation matches plan exactly`
+**Output**: Start with `[MODE: REVIEW]`, then systematic comparison and clear judgment.
+
+## Critical Rules
+
+- Cannot transition between modes without explicit permission
+- Must declare current mode at start of every response
+- In EXECUTE: must follow plan 100% faithfully
+- In REVIEW: must mark even the smallest deviation
+- No independent decision authority outside declared mode
+- Disable emoji output unless specifically requested
+- If no explicit mode transition signal, stay in current mode
+- Default: Start in RESEARCH mode
+
+## Mode Transition Signals
+
+Only transition on exact signals:
+- "ENTER RESEARCH MODE"
+- "ENTER INNOVATE MODE"
+- "ENTER PLAN MODE"
+- "ENTER EXECUTE MODE"
+- "ENTER REVIEW MODE"
+
+**Auto-transitions**:
+- If EXECUTE needs plan deviation -> return to PLAN mode
+- After all implementation confirmed by user -> move to REVIEW mode
diff --git a/.claude/skills/security-check/SKILL.md b/.claude/skills/security-check/SKILL.md
new file mode 100644
index 0000000000..e9115105bb
--- /dev/null
+++ b/.claude/skills/security-check/SKILL.md
@@ -0,0 +1,44 @@
+---
+name: security-check
+description: Security scanning before commit/push. Checks for sensitive information like API keys and tokens.
+---
+# Security Check Rules
+
+Before committing or pushing code, must check for sensitive information, especially API Keys and access tokens.
+
+## What to Check
+
+### API Keys and Access Tokens
+- API Keys starting with `sk-` (OpenAI, Anthropic, Alibaba Cloud, etc.)
+- Google API Keys starting with `AIzaSy`
+- Public keys starting with `pk_`
+- Other common API token formats
+
+## Before Commit
+
+### Run Check First
+Run `bash .claude/skills/security-check/scripts/security_check.sh commit` to check the staging area for sensitive information. If it does NOT output `staging area is clear`, sensitive information was found.
+
+### If Sensitive Information Found
+1. **Immediately delete or replace**: Replace real API Keys with placeholders
+2. **Use environment variables**: Move sensitive info to environment variables
+3. **Add to .gitignore**: Ensure files with sensitive info are not committed
+4. **Must refuse the commit/push action**
+
+## Before Push
+
+### Run Check First
+Run `bash .claude/skills/security-check/scripts/security_check.sh push` to check each commit for sensitive information. If it does NOT output `all commits are clear`, sensitive information was found. The commit hashes are written to `task/sensitive_commits.txt`.
+
+### If Sensitive Information Found
+1. **Immediately delete or replace**: Replace real API Keys with placeholders
+2. **Use environment variables**: Move sensitive info to environment variables
+3. **Add to .gitignore**: Ensure files with sensitive info are not committed
+4. **Must use the script below to clean history**
+
+```bash
+# Reset based on results in task/sensitive_commits.txt to avoid leaking commits
+bash .claude/skills/security-check/scripts/security_reset.sh
+```
+
+5. **Must refuse the commit/push action**
diff --git a/.claude/skills/security-check/scripts/security_check.sh b/.claude/skills/security-check/scripts/security_check.sh
new file mode 100755
index 0000000000..03848884fc
--- /dev/null
+++ b/.claude/skills/security-check/scripts/security_check.sh
@@ -0,0 +1,40 @@
+#!/bin/bash
+set -euo pipefail
+
+SENSITIVE_PATTERNS="(sk-[a-zA-Z0-9]{20,}|AIzaSy[a-zA-Z0-9_-]{30,}|pk_[a-zA-Z0-9]{10,}|ghp_[a-zA-Z0-9]{36,}|gho_[a-zA-Z0-9]{36,}|ghu_[a-zA-Z0-9]{36,}|ghs_[a-zA-Z0-9]{36,}|ghr_[a-zA-Z0-9]{36,})"
+MODE="${1:-}"
+
+if [ "$MODE" != "commit" ] && [ "$MODE" != "push" ]; then
+ echo "Usage: $0 [commit|push]"
+ exit 2
+fi
+
+if [ "$MODE" == "commit" ]; then
+ # 检查暂存区中的 API Keys
+ echo "checking staging area"
+ if git diff --cached --no-prefix | grep '^+' | grep -E "$SENSITIVE_PATTERNS"; then
+ echo "⚠️ staging area contains SENSITIVE information"
+ else
+ echo "✅ staging area is clear"
+ fi
+elif [ "$MODE" == "push" ]; then
+ # 检查所有要推送的 commit
+ is_clear=true
+ upstream=$(git rev-parse --abbrev-ref --symbolic-full-name @{u} 2>/dev/null) || upstream="origin/main"
+ mkdir -p task
+ > task/sensitive_commits.txt # 清空文件
+
+ while read -r commit; do
+ commit_hash=$(echo "$commit" | cut -d' ' -f1)
+ echo "checking commit: $commit"
+ if git show "$commit_hash" --no-commit-id --unified=0 | grep '^+' | grep -E "$SENSITIVE_PATTERNS"; then
+ echo "⚠️ commit $commit contains SENSITIVE information"
+ echo "$commit_hash" >> task/sensitive_commits.txt
+ is_clear=false
+ fi
+ echo "---"
+ done < <(git log "${upstream}"..HEAD --oneline)
+ if [ "$is_clear" = true ]; then
+ echo "✅ all commits are clear"
+ fi
+fi
\ No newline at end of file
diff --git a/.claude/skills/security-check/scripts/security_reset.sh b/.claude/skills/security-check/scripts/security_reset.sh
new file mode 100755
index 0000000000..7b93706899
--- /dev/null
+++ b/.claude/skills/security-check/scripts/security_reset.sh
@@ -0,0 +1,108 @@
+#!/bin/bash
+# 智能squash脚本 - 自动检测并清理包含敏感信息的commits
+echo "🔍 开始清理包含敏感信息的commits..."
+
+# 1. 检查task/sensitive_commits.txt文件是否存在且非空
+if [ ! -f "task/sensitive_commits.txt" ] || [ ! -s "task/sensitive_commits.txt" ]; then
+ echo "❌ 未找到敏感commits列表,请先运行push前检查"
+ exit 1
+fi
+
+# 读取敏感commits列表
+readarray -t sensitive_commits < task/sensitive_commits.txt
+
+# 2. 如果发现敏感信息,进行智能squash
+if [ ${#sensitive_commits[@]} -gt 0 ]; then
+ echo "🚨 发现 ${#sensitive_commits[@]} 个包含敏感信息的commits,开始清理..."
+
+ # 检查工作区是否干净
+ git status --porcelain | read -r _ && {
+ echo "⚠️ 工作区或暂存区有未提交的更改,先进行stash..."
+ git stash push -u -m "security-cleanup-backup-$(date +%Y%m%d-%H%M%S)"
+ stashed=true
+ } || stashed=false
+
+ # 获取要reset的目标commit
+ # 找到最早的敏感commit(数组最后一个),并获取其父commit
+ earliest_sensitive="${sensitive_commits[${#sensitive_commits[@]}-1]}"
+ parent_commit=$(git rev-parse --quiet "${earliest_sensitive}^")
+
+ # 获取所有需要被squash的commits(从最早的敏感commit的parent到HEAD)
+ if [ -n "$parent_commit" ]; then
+ commits_to_squash=($(git rev-list --reverse "${parent_commit}..HEAD"))
+ else
+ # 如果没有parent,说明最早的敏感commit是root commit
+ echo "⚠️ 最早的敏感commit是仓库的第一个commit"
+ commits_to_squash=($(git rev-list --reverse HEAD))
+ fi
+
+ if [ ${#commits_to_squash[@]} -eq 0 ]; then
+ echo "❌ 无法确定要squash的commit范围"
+ if [ "$stashed" = true ]; then
+ git stash pop
+ fi
+ exit 1
+ fi
+
+ # 获取所有要重新提交的commits的信息
+ echo "📝 提取所有commit messages..."
+ all_commit_details=""
+ main_subject=""
+
+ for commit_hash in "${commits_to_squash[@]}"; do
+ # 获取commit信息
+ subject=$(git log --format=%s -n 1 "$commit_hash")
+ body=$(git log --format=%b -n 1 "$commit_hash")
+
+ # 主题行用第一个commit的主题
+ if [ -z "$main_subject" ]; then
+ main_subject="$subject"
+ fi
+
+ subject_marker="$subject"
+
+ # 按GitHub squash格式添加commit详情
+ if [ -n "$body" ]; then
+ all_commit_details="${all_commit_details}* ${subject_marker}\n\n${body}\n\n"
+ else
+ all_commit_details="${all_commit_details}* ${subject_marker}\n\n"
+ fi
+ done
+
+ # 创建GitHub风格的squash commit message
+ new_message="${main_subject}\n\n${all_commit_details}"
+
+ # 执行squash
+ echo "🔄 执行squash操作..."
+ if [ -n "$parent_commit" ]; then
+ git reset --soft "$parent_commit"
+ else
+ echo "❌ 检测到最早敏感 commit 为 root commit,自动清理会涉及高风险历史重写,已中止。"
+ echo "请手动执行更安全流程(例如 orphan 分支重建)后再提交。"
+ if [ "$stashed" = true ]; then
+ echo "⚠️ 已为你恢复之前的工作区更改。"
+ git stash pop
+ fi
+ exit 1
+ fi
+
+ # 显示需要手动清理的文件
+ echo "📋 需要手动清理的文件:"
+ git status --porcelain | grep '^[AM]' | cut -c4-
+
+ echo ""
+ echo "✅ Squash完成!请执行以下步骤:"
+ echo "1. 手动清理上述文件中的敏感信息"
+ echo "2. 运行: git add ."
+ echo "3. 运行: git commit"
+ if [ "$stashed" = true ]; then
+ echo "4. 如需恢复之前的工作区更改: git stash pop"
+ fi
+ echo ""
+ echo "📝 新的commit message预览:"
+ echo "────────────────────────────────────────"
+ echo -e "$new_message"
+ echo "────────────────────────────────────────"
+else
+ echo "✅ 未发现包含敏感信息的commits"
+fi
diff --git a/.claude/skills/selfmonitor/SKILL.md b/.claude/skills/selfmonitor/SKILL.md
new file mode 100644
index 0000000000..b146df7bdc
--- /dev/null
+++ b/.claude/skills/selfmonitor/SKILL.md
@@ -0,0 +1,138 @@
+---
+name: selfmonitor
+description: Self-monitoring metrics, alarm code standards for LoongCollector. Read when changes involve metrics, alarms, or observability.
+---
+# Self-Monitoring Code Standards
+
+You are a self-monitoring code quality expert, responsible for ensuring LoongCollector code correctly uses self-monitoring features including metrics, alarms, code style, and implementation logic.
+
+## Metric Naming Conventions
+
+### Format
+
+**Variable name**: `{MODULE}_{METRIC_CONTENT_DESCRIPTION}_{UNIT}` (ALL CAPS)
+**Variable content**: `{metric_content_description}_{unit}` (all lowercase)
+
+Example:
+```cpp
+const string METRIC_RUNNER_FLUSHER_IN_RAW_SIZE_BYTES = "in_raw_size_bytes";
+```
+
+### Module Prefix Categories
+
+- **`agent_`**: Process-level metrics, describing entire Agent state
+- **`pipeline_`**: Pipeline-level metrics, describing data pipeline state
+- **`plugin_`**: Plugin-level metrics, describing specific plugin state
+- **`component_`**: Component-level metrics, describing internal component state
+- **`runner_`**: Runner-level metrics, describing runner state
+
+### Unit Categories
+
+#### Counter metrics
+- **`_total`**: Cumulative count (default), e.g. `input_records_total`, `send_success_total`
+
+#### Size metrics
+- **`_bytes`**: Bytes, e.g. `input_size_bytes`, `memory_used_bytes`
+- **`_mb`**: Megabytes (memory), e.g. `agent_memory_used_mb`
+
+#### Time metrics
+- **`_ms`**: Milliseconds (processing time, latency), e.g. `process_time_ms`
+- **`_s`**: Seconds (long intervals), e.g. `uptime_s`
+
+#### Ratio metrics
+- **`_percent`**: Percentage, e.g. `cpu_usage_percent`
+- **`_ps`**: Per second (rate), e.g. `send_bytes_ps`
+
+#### State metrics
+- **`_flag`**: Flag (0 or 1), e.g. `enabled_flag`
+- **`_state`**: State value, e.g. `register_state`
+
+### Label Naming Conventions
+
+**Label Key format**: `METRIC_LABEL_KEY_{description}`
+
+Common keys: `METRIC_LABEL_KEY_PROJECT`, `METRIC_LABEL_KEY_LOGSTORE`, `METRIC_LABEL_KEY_PIPELINE_NAME`, `METRIC_LABEL_KEY_PLUGIN_TYPE`, `METRIC_LABEL_KEY_PLUGIN_ID`, `METRIC_LABEL_KEY_FILE_NAME`, `METRIC_LABEL_KEY_FILE_DEV`, `METRIC_LABEL_KEY_FILE_INODE`, `METRIC_LABEL_KEY_REGION`, `METRIC_LABEL_KEY_RUNNER_NAME`
+
+## Alarm Level Conventions
+
+Based on PR #2319 design, alarm levels:
+
+| Level | Severity | Description | Typical Scenario |
+|-------|----------|-------------|------------------|
+| 1 | warning | Single point error, doesn't affect overall flow | Data parse failure; single collection/send failure |
+| 2 | error | Affects main flow, risk if not optimized | Queue busy; monitor exceeded; unsuccessful init |
+| 3 | critical | Severe impact: config/module unusable; affects agent stability; causes customer loss | Config load failure; unsuccessful module init; data drop; crash |
+
+### C++ Alarm Usage
+
+**Correct**:
+```cpp
+AlarmManager::GetInstance()->SendAlarmWarning(LOGTAIL_CONFIG_ALARM, "配置解析失败");
+AlarmManager::GetInstance()->SendAlarmError(PROCESS_QUEUE_BUSY_ALARM, "处理队列繁忙");
+AlarmManager::GetInstance()->SendAlarmCritical(CATEGORY_CONFIG_ALARM, "配置加载失败");
+```
+
+**Wrong**: Don't use old `SendAlarm` interface.
+
+### Go Alarm Usage
+
+**Correct**:
+```go
+logger.Warning(ctx, selfmonitor.CategoryConfigAlarm, "配置解析失败")
+logger.Error(ctx, selfmonitor.ProcessQueueBusyAlarm, "处理队列繁忙")
+logger.Critical(ctx, selfmonitor.CategoryConfigAlarm, "配置加载失败")
+```
+
+## Adding New Metrics
+
+### C++ Steps
+
+1. **Define metric constants**: Add to `core/monitor/metric_constants/MetricConstants.h`
+2. **Create MetricsRecordRef** with labels in Init()
+3. **Create metric objects** (CounterPtr, IntGaugePtr) BEFORE commit
+4. **Update values** using macros: `ADD_COUNTER()`, `SET_GAUGE()`, `ADD_GAUGE()`
+
+**Critical**: MetricsRecordRef must create all metric objects BEFORE commit. After commit, no new metrics can be created. Use `IsCommitted()` to check state. If a Gauge default is non-zero, set it once during Init.
+
+### Go Steps
+
+1. **Define constants**: Add to `pkg/selfmonitor/metrics_constants_*.go`
+2. **Register metrics** in `InitMetricRecord()`:
+ ```go
+ p.MetricRecord = p.Config.Context.RegisterMetricRecord(labels)
+ p.metricCounter = selfmonitor.NewCounterMetricAndRegister(p.MetricRecord, selfmonitor.MetricPluginInEventsTotal)
+ ```
+3. **Update values**: Check nil before updating.
+
+## Adding New Alarm Types
+
+### C++ Steps
+
+1. Add to `core/monitor/AlarmManager.h` enum `AlarmType`
+2. Add to `mMessageType` vector in `AlarmManager.cpp` constructor
+3. Use leveled interfaces: `SendAlarmWarning`, `SendAlarmError`, `SendAlarmCritical`
+
+### Go Steps
+
+1. Add to `pkg/selfmonitor/alarm_constants.go`
+2. Use leveled interfaces: `logger.Warning`, `logger.Error`, `logger.Critical`
+
+## Best Practices
+
+1. Create metric objects once during initialization, not per-call
+2. Use safe update macros that check for null
+3. Choose alarm level matching severity
+4. Provide meaningful alarm messages with context
+5. Avoid alarm storms - limit frequency of same alarm
+6. Metrics should not impact main flow performance
+
+## Checklist
+
+Before submitting self-monitoring code:
+- [ ] Metric names follow naming convention with correct module prefix and unit
+- [ ] Labels follow naming convention
+- [ ] Correct alarm level interface used, matching severity
+- [ ] No deprecated interfaces used
+- [ ] Metrics created once, updated safely
+- [ ] Alarm storms avoided
+- [ ] Error handling complete
diff --git a/.claude/skills/testing-standards/SKILL.md b/.claude/skills/testing-standards/SKILL.md
new file mode 100644
index 0000000000..f1c3618e89
--- /dev/null
+++ b/.claude/skills/testing-standards/SKILL.md
@@ -0,0 +1,99 @@
+---
+name: testing-standards
+description: Testing standards for LoongCollector: unit tests, e2e tests, benchmarks. Reference when writing or reviewing tests.
+---
+# LoongCollector Testing Standards
+
+## Test Categories
+
+### 1. Unit Tests (C++)
+- Use Google Test (GTest) / Google Mock
+- Place in `core/unittest/`
+- Cover success and failure paths
+- Core logic must have 100% coverage
+- Test boundary conditions explicitly
+- Test naming: accurately describe behavior being tested
+- Each `core/unittest/*/` directory produces one executable
+- Build and run tests from inside `build/` to ensure relative paths and temp files work correctly
+- See `.claude/skills/compile/SKILL.md` for build & run instructions
+
+### 2. Unit Tests (Go)
+- Use standard `testing` package
+- Table-driven tests for function coverage
+- Integration tests via E2E framework
+
+### 3. E2E Tests
+- BDD Godog framework
+- Configuration-driven via `.feature` files
+- See `.claude/skills/e2e/SKILL.md` for complete guide (design → write → run → debug)
+
+### 4. Benchmarks
+- Required for performance-sensitive code paths
+- Compare against baseline versions
+- Measure throughput, latency, CPU/Memory usage
+
+## E2E Test Quick Reference
+
+### Feature File Structure
+```
+@input
+Feature: input file
+ Test input file
+
+ @e2e @host
+ Scenario: TestInputFileWithRegexSingle
+ Given {host} environment
+ Given subcribe data from {sls} with config
+ """
+ enable: true
+ inputs:
+ - Type: input_file
+ """
+ When generate {100} regex logs to file {/tmp/loongcollector/regex_single.log}, with interval {100}ms
+ Then there is {100} logs
+```
+
+### Behavior Types
+| Type | Purpose |
+|------|---------|
+| `Given` | Setup/prepare test conditions |
+| `When` | Trigger test actions (e.g., log generation) |
+| `Then` | Verify test results |
+
+### Environment Tags
+- `@host` - Host environment
+- `@k8s` - Kubernetes environment
+- `@docker-compose` - Docker Compose environment
+- `@e2e` - E2E test marker
+- `@regression` - Regression test marker
+
+### Adding New Test Behaviors
+1. Write the Go function in the appropriate directory:
+ - `cleanup/` - Post-test cleanup (auto-executed)
+ - `control/` - Control operations (init, config)
+ - `setup/` - Environment setup
+ - `trigger/` - Data generation
+ - `verify/` - Result verification
+2. Function signature: `func Name(ctx context.Context, params...) (context.Context, error)`
+3. Register in `test/e2e_enterprise/main_test.go` via `scenarioInitializer`
+4. Use in feature files with `{param}` syntax
+
+### Strict Rules
+- Do NOT change behavior of the method being tested
+- Do NOT modify existing test behaviors in engine
+- Always start trigger `When begin trigger` BEFORE generating logs
+- Only use registered behaviors from `test/engine/steps.go`
+- Verify behavior type matches (Given/When/Then)
+
+### Test Naming
+- Format: `Test${FunctionName}${CaseBriefDescription}`
+- Examples: `TestInputFileWithBlackListDir`, `TestInputFileWithRegexSingle`
+- Must include `@e2e` and environment tags
+
+## Benchmark Testing
+
+For performance-sensitive code:
+1. Provide baseline comparison
+2. Measure: throughput, latency, CPU profile, memory profile
+3. Run under realistic load conditions
+4. Document methodology and results
diff --git a/.cursor/rules/project-knowledge/config-pitfalls.mdc b/.cursor/rules/project-knowledge/config-pitfalls.mdc
new file mode 100644
index 0000000000..2aa3c16b0c
--- /dev/null
+++ b/.cursor/rules/project-knowledge/config-pitfalls.mdc
@@ -0,0 +1,41 @@
+---
+description: LoongCollector 采集配置常见陷阱。编写或审查 pipeline config YAML 时参考。
+globs:
+ - "**/*.feature"
+ - "**/case.feature"
+ - "core/config/**"
+ - "test/e2e/**"
+alwaysApply: false
+---
+# LoongCollector 采集配置陷阱
+
+## ExcutionTimeout 使配置变为一次性(onetime)
+
+`global.ExcutionTimeout` 存在于配置中时,**整个配置**被标记为 onetime 类型。
+只有注册了 `RegisterOnetimeInputCreator` 的插件才能在 onetime 配置中使用。
+
+大部分输入插件(`input_forward`, `input_file`, `input_container_stdio`, `input_prometheus` 等)只注册了 `RegisterContinuousInputCreator`,在 onetime 配置中会报错:
+
+```
+failed to parse config:unsupported input plugin module:input_forward
+```
+
+### 判断逻辑
+
+```
+global.ExcutionTimeout 存在
+ → PipelineConfig::GetExpireTimeIfOneTime → mOnetimeExpireTime 被设置
+ → CollectionConfig::IsOnetime() == true
+ → IsValidNativeInputPlugin(name, true) 在 ONETIME 注册表查找
+ → 找不到 → "unsupported input plugin"
+```
+
+### 支持 onetime 的输入插件
+
+查看 `PluginRegistry::LoadStaticPlugins()` 中调用 `RegisterOnetimeInputCreator` 的插件,如 `InputStaticFile`。
+
+### 规则
+
+- **持续运行的输入插件配置中不要使用 `ExcutionTimeout`**
+- E2E 测试不需要 `ExcutionTimeout` 来控制超时,Go test 的 `-timeout` 参数已经提供了保护
+- 如果确实需要一次性采集,使用 `onetime_pipeline_config` 目录 + 支持 onetime 的输入插件
diff --git a/.cursor/skills/compile/SKILL.md b/.cursor/skills/compile/SKILL.md
index 1e7c938e62..a9b9c7af02 100644
--- a/.cursor/skills/compile/SKILL.md
+++ b/.cursor/skills/compile/SKILL.md
@@ -8,44 +8,89 @@ description: Building
### C++ 部分编译方法
-1. 判断是否进行增量编译。如果已有 `build` 目录,并且其中有内容,并且你的修改没有涉及到 CMake 相关文件,那么跳转到第5步进行增量编译。
+**重要:所有 CMake 和 make 命令必须在 `build/` 目录内执行。**
-2. 创建编译目录
-
-``` bash
-mkdir -p build
+**前置条件** — 首次编译前需初始化 Git 子模块:
+```bash
+git submodule update --init --recursive
```
+两个子模块位于 `core/_thirdparty/`:
+- `DCGM` — NVIDIA DCGM 头文件
+- `coolbpf` — eBPF 框架
+
+如果子模块目录为空,编译会报 `No such file or directory` 错误。
+
+#### 编译步骤
-3. 进入编译目录
+1. 判断是否进行增量编译。如果已有 `build` 目录,并且其中有内容,并且你的修改没有涉及到 CMake 相关文件,那么跳转到第 4 步进行增量编译。
-``` bash
-cd build
+2. 创建并进入编译目录
+
+```bash
+mkdir -p build && cd build
```
-4. 构建 CMake 命令
+3. 构建 CMake 命令
-``` bash
+```bash
cmake -DCMAKE_BUILD_TYPE=Debug -DLOGTAIL_VERSION=0.0.1 \
-DCMAKE_EXPORT_COMPILE_COMMANDS=1 \
-DCMAKE_CXX_FLAGS="-I/opt/rh/devtoolset-9/root/usr/lib/gcc/x86_64-redhat-linux/9/include -I/opt/logtail -I/opt/logtail_spl" \
- -DBUILD_LOGTAIL=ON -DBUILD_LOGTAIL_UT=ON -DWITHOUTGDB=ON -DENABLE_STATIC_LINK_CRT=ON -DWITHSPL=ON ../core
+ -DBUILD_LOGTAIL=ON -DBUILD_LOGTAIL_UT=ON -DWITHOUTGDB=ON -DENABLE_STATIC_LINK_CRT=ON -DWITHSPL=OFF ../core
```
-注意其中的几个开关:
- - BUILD_LOGTAIL:表示编译 LoongCollector 二进制。必选
- - BUILD_LOGTAIL_UT:表示编译 LoongCollector 单测。仅当你修改了 LoongCollector 单测时才打开。
- - WITHSPL:表示编译 LoongCollector SPL 相关内容。仅当你修改了 LoongCollector SPL 相关文件时才打开。
+关键 CMake 开关:
+
+| 开关 | 用途 |
+|------|------|
+| `BUILD_LOGTAIL` | 编译 LoongCollector 二进制。必选。 |
+| `BUILD_LOGTAIL_UT` | 编译单元测试。修改了测试代码时打开。 |
+| `WITHSPL` | SPL 支持。除非修改了 SPL 相关文件,否则设为 `OFF`。 |
-5. 编译
+4. 编译
-``` bash
+```bash
make -sj$(nproc)
```
-### Go 部分编译方法
+#### C++ 单元测试
+
+每个 `core/unittest/*/` 下的测试目录会生成独立的可执行文件。
+
+**编译指定测试**(在 `build/` 目录内):
+```bash
+make yaml_util_unittest app_config_unittest safe_queue_unittest -j$(nproc)
+```
+
+**运行测试**(在 `build/` 目录内):
+```bash
+./unittest/common/yaml_util_unittest
+./unittest/app_config/app_config_unittest
+```
-执行
+测试必须在 `build/` 目录内运行,因为部分测试依赖相对路径加载配置文件。
-``` bash
+### Go 部分编译方法
+
+```bash
make plugin_local
```
+
+### Docker 构建
+
+```bash
+make image
+```
+
+### 交叉编译
+
+ARM64 架构:
+```bash
+make image ARCH=arm64
+```
+
+### 常见问题
+
+- 如果 CMake 报缺少依赖,通过 `apt` 或 `yum` 安装
+- 如果链接失败,尝试 `make clean` 后重新构建
+- 需要 SPL 相关功能时,将 `WITHSPL=OFF` 改为 `WITHSPL=ON`
diff --git a/.cursor/skills/e2e/SKILL.md b/.cursor/skills/e2e/SKILL.md
new file mode 100644
index 0000000000..92a8c4df6b
--- /dev/null
+++ b/.cursor/skills/e2e/SKILL.md
@@ -0,0 +1,209 @@
+---
+name: e2e
+description: LoongCollector E2E 测试全流程指南:设计、编写、运行和调试。当需要编写新 E2E 测试、运行现有测试、或排查 E2E 测试失败时使用此 skill。
+---
+# LoongCollector E2E 测试指南
+
+> 详细步骤模板见 [reference.md](reference.md) | 可复用脚本见 [scripts/](scripts/)
+
+## 目录
+
+1. [概览](#1-概览)
+2. [设计测试用例](#2-设计测试用例)
+3. [编写测试用例](#3-编写测试用例)
+4. [本地运行(docker-compose)](#4-本地运行)
+5. [调试](#5-调试)
+6. [已知陷阱](#6-已知陷阱)
+
+---
+
+## 1 概览
+
+基于 **BDD Godog** 框架,通过 `.feature` 文件描述场景,引擎正则匹配步骤函数并传参。
+
+```
+test/e2e/
+ test_cases//
+ case.feature # 场景描述
+ docker-compose.yaml # 可选,外部依赖服务
+ engine/
+ steps.go # 所有可用步骤(权威来源)
+ setup/ control/ trigger/ verify/ cleanup/
+```
+
+**环境 tag**:`@host`、`@k8s`、`@docker-compose`(三选一,加 `@e2e`)
+
+---
+
+## 2 设计测试用例
+
+编写 feature 文件前,先确定测试矩阵。按以下维度逐项评估是否需要覆盖:
+
+### 2.1 场景维度清单
+
+| 维度 | 典型场景 | 何时需要 |
+|------|----------|----------|
+| **基础功能** | 单配置、单数据类型端到端 | 必须 |
+| **多数据类型** | logs / metrics / traces 分别验证 | 插件支持多类型时 |
+| **多配置共存** | 同时加载多个 pipeline 配置 | 涉及端口/资源竞争时 |
+| **配置热加载** | 运行中增/删/改配置 | 持续运行的 input 插件 |
+| **配置类型变更** | 从 A 类型切换到 B 类型 | 插件支持多协议/格式时 |
+| **反压与恢复** | 下游不可达 → 恢复后数据不丢 | flusher 插件 |
+| **外部依赖失效** | 依赖服务重启/不可达 | 有外部依赖时 |
+| **大数据量** | 高吞吐压力下不 OOM/不丢数据 | 性能敏感路径 |
+
+### 2.2 设计产出
+
+确定要覆盖的场景后,明确每个 Scenario 的:
+- **输入**:什么数据、什么格式、多少条
+- **流经路径**:input → processor → flusher 的具体插件
+- **预期输出**:在哪里验证、验证什么
+- **外部依赖**:需要什么辅助服务(OTel Collector、Kafka 等)
+
+---
+
+## 3 编写测试用例
+
+### 3.1 目录结构
+
+```
+test/e2e/test_cases/my_feature/
+├── case.feature
+├── docker-compose.yaml # 外部依赖
+└── otel-collector-config.yaml # 如果用 OTel Collector
+```
+
+### 3.2 Feature 文件模板
+
+```gherkin
+@flusher
+Feature: my feature name
+ Brief description
+
+ @e2e @docker-compose
+ Scenario: TestMyFeatureLogs
+ Given {docker-compose} environment
+ Given {my-config} local config as below
+ """
+ enable: true
+ inputs:
+ - Type: input_forward
+ Protocol: OTLP
+ Endpoint: "0.0.0.0:4320"
+ flushers:
+ - Type: flusher_otlp_native
+ Endpoint: "otel-collector:4317"
+ """
+ When start docker-compose {my_feature}
+ Then wait {10} seconds
+ When generate {1} OTLP {logs} via otelgen to endpoint {loongcollectorC:4320}, protocol {grpc}
+ Then wait {5} seconds
+ Then otlp collector received at least {1} logs from file {/tmp/otel-export/logs.json}
+```
+
+### 3.3 强制规则
+
+- 配置中必须含 `enable: true`
+- **只使用** `test/engine/steps.go` 中已注册的步骤
+- `wait {N} seconds` 是 **Then** 类型,不是 When
+- 命名格式:`Test${功能名}${场景描述}`
+- **不要**在持续运行插件的配置中使用 `global.ExcutionTimeout`(见 §6.1)
+
+### 3.4 扩展步骤
+
+如需新步骤,参考 [reference.md §扩展步骤](reference.md) 中的开发和注册流程。
+
+---
+
+## 4 本地运行
+
+### 4.1 前置条件
+
+```bash
+docker --version && docker compose version
+```
+
+如修改了 C++ 代码,需重新编译并更新镜像。两种方式:
+
+**方式一:完整构建**(慢,但保证一致)
+```bash
+make e2e_image # 从源码构建完整 Docker 镜像 aliyun/loongcollector:0.0.1
+```
+
+**方式二:增量更新**(快,适合迭代调试)
+```bash
+cd build && make -sj$(nproc) && cd ..
+# 替换镜像中的二进制
+docker create --name tmp-lc aliyun/loongcollector:0.0.1
+docker cp build/loongcollector tmp-lc:/usr/local/loongcollector/loongcollector
+docker commit tmp-lc aliyun/loongcollector:0.0.1
+docker rm tmp-lc
+```
+
+### 4.2 运行
+
+```bash
+cd test/e2e
+
+# 运行整个测试用例(所有 Scenario)
+TEST_CASE=flusher_otlp_native go test -v -run "TestE2EOnDockerCompose$" \
+ -timeout 600s -count=1 ./...
+
+# 只运行指定 Scenario
+TEST_CASE=flusher_otlp_native go test -v \
+ -run "TestE2EOnDockerCompose/TestFlusherOTLPNativeLogs$" \
+ -timeout 600s -count=1 ./...
+```
+
+### 4.3 清理(测试失败后必做)
+
+可以直接运行脚本 `bash .cursor/skills/e2e/scripts/e2e-cleanup.sh`,或手动执行:
+
+```bash
+docker rm -f $(docker ps -aq) 2>/dev/null
+docker network prune -f
+rm -rf test/e2e/config test/e2e/onetime_pipeline_config
+sudo rm -rf test/e2e/report
+rm -f test/e2e/test_cases//testcase-compose.yaml
+```
+
+---
+
+## 5 调试
+
+```bash
+# 1. 查看容器日志
+docker ps | grep loongcollectorC
+docker exec cat /usr/local/loongcollector/log/loongcollector.LOG
+
+# 2. 检查配置是否加载
+docker exec ls /usr/local/loongcollector/conf/continuous_pipeline_config/local/
+
+# 3. 检查端口是否监听
+docker exec ss -tlnp | grep
+
+# 4. 手动复现 compose 环境
+cd test/e2e/test_cases/
+docker compose -f testcase-compose.yaml up -d
+docker compose -f testcase-compose.yaml logs -f loongcollectorC
+```
+
+---
+
+## 6 已知陷阱
+
+### 6.1 ExcutionTimeout 使配置变为一次性
+
+**绝对不要**在 `input_forward`、`input_file` 等持续插件的配置中使用 `global.ExcutionTimeout`。
+
+它会使 `IsOnetime()` 返回 true,导致 `IsValidNativeInputPlugin(name, true)` 在 onetime 注册表中查找,而大部分 input 只注册了 continuous,结果报 `unsupported input plugin`。
+
+详见 `.cursor/rules/project-knowledge/config-pitfalls.mdc`。
+
+### 6.2 FlusherFile 必须是文件
+
+e2e 模板将 `report/default_flusher.json` bind-mount 到容器。若宿主机路径不存在,Docker 会创建为**目录**。已在 `BootController.Start()` 中自动处理。
+
+### 6.3 测试间残留
+
+多 Scenario 共享进程,`Clean()` 会删除 config/report。异常退出后手动清理(§4.3)。
diff --git a/.cursor/skills/e2e/reference.md b/.cursor/skills/e2e/reference.md
new file mode 100644
index 0000000000..d1c662ae35
--- /dev/null
+++ b/.cursor/skills/e2e/reference.md
@@ -0,0 +1,134 @@
+# E2E 测试详细参考
+
+## 可用步骤速查
+
+> 权威来源:`test/engine/steps.go`
+
+### Given(环境准备)
+
+| 步骤模板 | 说明 |
+|----------|------|
+| `{docker-compose} environment` | 初始化 docker-compose 环境 |
+| `{host} environment` | 初始化主机环境 |
+| `{daemonset} environment` | 初始化 K8s 环境 |
+| `{name} local config as below` | 写入持续采集配置 |
+| `{name} onetime pipeline local config as below` | 写入一次性采集配置 |
+| `subcribe data from {sls} with config` | 订阅 SLS 数据源 |
+| `loongcollector depends on containers {name}` | 设置容器依赖 |
+| `loongcollector container mount {src} to {dst}` | 挂载卷 |
+| `loongcollector expose port {host} to {container}` | 暴露端口 |
+| `docker-compose boot type {type}` | 设置 boot 类型 |
+| `mkdir {path}` | 创建目录 |
+
+### When(触发动作)
+
+| 步骤模板 | 说明 |
+|----------|------|
+| `start docker-compose {case_name}` | 启动 docker-compose 环境 |
+| `begin trigger` | 标记触发开始时间(生成日志前必须调用) |
+| `generate {N} regex logs to file {path}, with interval {M}ms` | 生成正则日志 |
+| `generate {N} json logs to file {path}, with interval {M}ms` | 生成 JSON 日志 |
+| `generate {N} apsara logs to file {path}, with interval {M}ms` | 生成 Apsara 日志 |
+| `generate {N} OTLP {logs\|metrics\|traces} via otelgen to endpoint {ep}, protocol {grpc\|http}` | 生成 OTLP 数据 |
+| `generate {N} http logs, with interval {M}ms, url: {url}, method: {method}, body:` | 生成 HTTP 日志 |
+| `execute {N} commands {cmd} in sequence` | 顺序执行命令 |
+| `execute {N} commands {cmd} in parallel` | 并行执行命令 |
+| `create the shell script file {name} with the following content` | 创建 shell 脚本 |
+| `execute {N} the shell script file {name} in parallel` | 并行执行 shell 脚本 |
+| `restart agent` | 重启 Agent |
+| `force restart agent` | 强制重启 Agent |
+
+### Then(结果验证)
+
+| 步骤模板 | 说明 |
+|----------|------|
+| `there is {N} logs` | 精确验证日志数(上限 100) |
+| `there is at least {N} logs` | 最少日志数验证 |
+| `there is less than {N} logs` | 最多日志数验证 |
+| `the log fields match kv` | KV 字段匹配(文档内容跟 `"""..."""`) |
+| `the log fields match as below` | 日志字段模式匹配 |
+| `the log tags match kv` | Tag KV 匹配 |
+| `the log is in order` | 日志顺序验证 |
+| `wait {N} seconds` | 等待 N 秒 |
+| `otlp collector received at least {N} (logs\|metrics\|traces) from file {path}` | OTel Collector 数据验证 |
+
+> 注意:日志数量验证上限 100。超过 100 用 `When query through` + `Then the log fields match kv` 方式。
+
+---
+
+## 扩展步骤
+
+### 1. 编写函数
+
+在 `test/engine/` 对应子目录下:
+
+```go
+func MyVerification(ctx context.Context, expected int) (context.Context, error) {
+ // 实现逻辑
+ return ctx, nil
+}
+```
+
+签名要求:第一个参数 `context.Context`,返回 `(context.Context, error)`。
+
+### 2. 注册
+
+在 `test/engine/steps.go` 中:
+
+```go
+ctx.Then(`^my verification expects \{(\d+)\}$`, verify.MyVerification)
+```
+
+### 3. 使用
+
+```gherkin
+Then my verification expects {42}
+```
+
+---
+
+## docker-compose.yaml 示例
+
+### OTel Collector(OTLP 测试用)
+
+```yaml
+services:
+ otel-collector:
+ image: otel/opentelemetry-collector-contrib:latest
+ hostname: otel-collector
+ user: "0:0"
+ ports:
+ - "4317"
+ volumes:
+ - ./otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml
+ - ./otel-export:/tmp/otel-export
+ healthcheck:
+ test: ["CMD", "wget", "--spider", "-q", "http://localhost:13133/"]
+ interval: 5s
+ timeout: 3s
+ retries: 5
+ start_period: 10s
+```
+
+---
+
+## eBPF 进程安全测试示例
+
+```gherkin
+@e2e @host @ebpf_input
+Scenario: TestEBPFProcessSecurityByNormalStart
+ Given {host} environment
+ Given subcribe data from {sls} with config
+ """
+ """
+ Given {ebpf_process_security_default} local config as below
+ """
+ enable: true
+ inputs:
+ - Type: input_process_security
+ """
+ When begin trigger
+ When execute {1} commands {/bin/echo 1} in sequence
+ When query through {* | select * from e2e where call_name = 'execve' and binary = '/bin/echo' and arguments = '1'}
+ Then there is {1} logs
+```
diff --git a/.cursor/skills/e2e/scripts/e2e-cleanup.sh b/.cursor/skills/e2e/scripts/e2e-cleanup.sh
new file mode 100755
index 0000000000..ff61e91eb1
--- /dev/null
+++ b/.cursor/skills/e2e/scripts/e2e-cleanup.sh
@@ -0,0 +1,32 @@
+#!/usr/bin/env bash
+# E2E 测试环境清理脚本
+# 用法: bash .cursor/skills/e2e/scripts/e2e-cleanup.sh [case_name]
+set -euo pipefail
+
+REPO_ROOT="$(git rev-parse --show-toplevel)"
+E2E_DIR="$REPO_ROOT/test/e2e"
+CASE_NAME="${1:-}"
+
+echo "==> 停止并删除所有 Docker 容器..."
+docker rm -f $(docker ps -aq) 2>/dev/null || true
+
+echo "==> 清理 Docker 网络..."
+docker network prune -f 2>/dev/null || true
+
+echo "==> 清理运行时目录..."
+rm -rf "$E2E_DIR/config" "$E2E_DIR/onetime_pipeline_config"
+sudo rm -rf "$E2E_DIR/report" 2>/dev/null || rm -rf "$E2E_DIR/report" 2>/dev/null || true
+
+if [[ -n "$CASE_NAME" ]]; then
+ CASE_DIR="$E2E_DIR/test_cases/$CASE_NAME"
+ if [[ -d "$CASE_DIR" ]]; then
+ echo "==> 清理测试用例 $CASE_NAME..."
+ rm -f "$CASE_DIR/testcase-compose.yaml"
+ rm -f "$CASE_DIR/otel-export/"*.json 2>/dev/null || true
+ fi
+else
+ echo "==> 清理所有测试用例的 testcase-compose.yaml..."
+ find "$E2E_DIR/test_cases" -name "testcase-compose.yaml" -delete 2>/dev/null || true
+fi
+
+echo "==> 清理完成"
diff --git a/.cursor/skills/security-check/scripts/security_check.sh b/.cursor/skills/security-check/scripts/security_check.sh
index 455023c35f..03848884fc 100644
--- a/.cursor/skills/security-check/scripts/security_check.sh
+++ b/.cursor/skills/security-check/scripts/security_check.sh
@@ -1,7 +1,7 @@
#!/bin/bash
set -euo pipefail
-SENSITIVE_PATTERNS="(sk-[a-zA-Z0-9]|AIzaSy[a-zA-Z0-9]|pk_[a-zA-Z0-9]|ghp_[a-zA-Z0-9]|gho_[a-zA-Z0-9]|ghu_[a-zA-Z0-9]|ghs_[a-zA-Z0-9]|ghr_[a-zA-Z0-9])"
+SENSITIVE_PATTERNS="(sk-[a-zA-Z0-9]{20,}|AIzaSy[a-zA-Z0-9_-]{30,}|pk_[a-zA-Z0-9]{10,}|ghp_[a-zA-Z0-9]{36,}|gho_[a-zA-Z0-9]{36,}|ghu_[a-zA-Z0-9]{36,}|ghs_[a-zA-Z0-9]{36,}|ghr_[a-zA-Z0-9]{36,})"
MODE="${1:-}"
if [ "$MODE" != "commit" ] && [ "$MODE" != "push" ]; then
diff --git a/.devcontainer/Dockerfile b/.devcontainer/Dockerfile
index 7e98b005ca..95f461da61 100644
--- a/.devcontainer/Dockerfile
+++ b/.devcontainer/Dockerfile
@@ -18,9 +18,18 @@ ARG USERNAME=admin
ARG USER_PASSWORD
USER root
+RUN sed -i '/mirrors.aliyuncs.com\|mirrors.cloud.aliyuncs.com/d' /etc/yum.repos.d/CentOS-Base.repo
RUN yum -y install openssh-server && \
ssh-keygen -A
+# Feature Docker-in-Docker
+COPY dind-install.sh /tmp/dind-install.sh
+RUN chmod +x /tmp/dind-install.sh && \
+ MOBY=false \
+ DOCKERDASHCOMPOSEVERSION=none \
+ INSTALLDOCKERBUILDX=false \
+ /tmp/dind-install.sh
+
# Create the user
COPY .env /tmp/.env
COPY authorized_keys /tmp/authorized_keys
@@ -51,7 +60,9 @@ RUN cp /opt/logtail/deps/lib/libssl.so.1.0.0 /usr/lib64; \
echo "export PATH=/usr/local/go/bin:/opt/logtail/deps/bin:$PATH" >> /home/$USERNAME/.bashrc; \
su - $USERNAME -c "\
go env -w GO111MODULE=on && \
- go env -w GOPROXY=https://goproxy.cn,direct"
+ go env -w GOPROXY=https://goproxy.cn,direct" && \
+ usermod -aG docker $USERNAME
USER $USERNAME
+# ENTRYPOINT [ "/usr/local/share/docker-init.sh" ]
\ No newline at end of file
diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json
index 76caba5e57..7709d22788 100644
--- a/.devcontainer/devcontainer.json
+++ b/.devcontainer/devcontainer.json
@@ -9,13 +9,15 @@
"privileged": true,
"mounts": [
{ "source": "/sys", "target": "/sys", "type": "bind" },
- { "source": "/", "target": "/logtail_host", "type": "bind" }
+ { "source": "/", "target": "/logtail_host", "type": "bind" },
+ { "source": "loongcollector-dind-data", "target": "/var/lib/docker", "type": "volume" }
],
"runArgs": [
"--cap-add=SYS_PTRACE",
"--security-opt", "seccomp=unconfined"
],
"onCreateCommand": "sudo chown -R $(id -un):$(id -gn) /root",
+ "postStartCommand": "sudo bash ${containerWorkspaceFolder}/.devcontainer/start-dind.sh",
"postCreateCommand": "sudo /usr/sbin/sshd",
"customizations": {
"vscode": {
diff --git a/.devcontainer/dind-install.sh b/.devcontainer/dind-install.sh
new file mode 100644
index 0000000000..d364880676
--- /dev/null
+++ b/.devcontainer/dind-install.sh
@@ -0,0 +1,1022 @@
+#!/usr/bin/env bash
+#-------------------------------------------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License. See https://go.microsoft.com/fwlink/?linkid=2090316 for license information.
+#-------------------------------------------------------------------------------------------------------------
+#
+# Docs: https://github.com/microsoft/vscode-dev-containers/blob/main/script-library/docs/docker-in-docker.md
+# Maintainer: The Dev Container spec maintainers
+
+
+DOCKER_VERSION="${VERSION:-"latest"}" # The Docker/Moby Engine + CLI should match in version
+USE_MOBY="${MOBY:-"true"}"
+MOBY_BUILDX_VERSION="${MOBYBUILDXVERSION:-"latest"}"
+DOCKER_DASH_COMPOSE_VERSION="${DOCKERDASHCOMPOSEVERSION:-"v2"}" #v1, v2 or none
+AZURE_DNS_AUTO_DETECTION="${AZUREDNSAUTODETECTION:-"true"}"
+DOCKER_DEFAULT_ADDRESS_POOL="${DOCKERDEFAULTADDRESSPOOL:-""}"
+USERNAME="${USERNAME:-"${_REMOTE_USER:-"automatic"}"}"
+INSTALL_DOCKER_BUILDX="${INSTALLDOCKERBUILDX:-"true"}"
+INSTALL_DOCKER_COMPOSE_SWITCH="${INSTALLDOCKERCOMPOSESWITCH:-"false"}"
+MICROSOFT_GPG_KEYS_URI="https://packages.microsoft.com/keys/microsoft.asc"
+MICROSOFT_GPG_KEYS_ROLLING_URI="https://packages.microsoft.com/keys/microsoft-rolling.asc"
+DOCKER_MOBY_ARCHIVE_VERSION_CODENAMES="trixie bookworm buster bullseye bionic focal jammy noble"
+DOCKER_LICENSED_ARCHIVE_VERSION_CODENAMES="trixie bookworm buster bullseye bionic focal hirsute impish jammy noble"
+DISABLE_IP6_TABLES="${DISABLEIP6TABLES:-false}"
+
+# Default: Exit on any failure.
+set -e
+
+# Clean up
+rm -rf /var/lib/apt/lists/*
+
+# Setup STDERR.
+err() {
+ echo "(!) $*" >&2
+}
+
+if [ "$(id -u)" -ne 0 ]; then
+ err 'Script must be run as root. Use sudo, su, or add "USER root" to your Dockerfile before running this script.'
+ exit 1
+fi
+
+###################
+# Helper Functions
+# See: https://github.com/microsoft/vscode-dev-containers/blob/main/script-library/shared/utils.sh
+###################
+
+# Determine the appropriate non-root user
+if [ "${USERNAME}" = "auto" ] || [ "${USERNAME}" = "automatic" ]; then
+ USERNAME=""
+ POSSIBLE_USERS=("vscode" "node" "codespace" "$(awk -v val=1000 -F ":" '$3==val{print $1}' /etc/passwd)")
+ for CURRENT_USER in "${POSSIBLE_USERS[@]}"; do
+ if id -u ${CURRENT_USER} > /dev/null 2>&1; then
+ USERNAME=${CURRENT_USER}
+ break
+ fi
+ done
+ if [ "${USERNAME}" = "" ]; then
+ USERNAME=root
+ fi
+elif [ "${USERNAME}" = "none" ] || ! id -u ${USERNAME} > /dev/null 2>&1; then
+ USERNAME=root
+fi
+
+# Package manager update function
+pkg_mgr_update() {
+ case ${ADJUSTED_ID} in
+ debian)
+ if [ "$(find /var/lib/apt/lists/* | wc -l)" = "0" ]; then
+ echo "Running apt-get update..."
+ apt-get update -y
+ fi
+ ;;
+ rhel)
+ if [ ${PKG_MGR_CMD} = "microdnf" ]; then
+ cache_check_dir="/var/cache/yum"
+ else
+ cache_check_dir="/var/cache/${PKG_MGR_CMD}"
+ fi
+ if [ "$(ls ${cache_check_dir}/* 2>/dev/null | wc -l)" = 0 ]; then
+ echo "Running ${PKG_MGR_CMD} makecache ..."
+ ${PKG_MGR_CMD} makecache
+ fi
+ ;;
+ esac
+}
+
+# Checks if packages are installed and installs them if not
+check_packages() {
+ case ${ADJUSTED_ID} in
+ debian)
+ if ! dpkg -s "$@" > /dev/null 2>&1; then
+ pkg_mgr_update
+ apt-get -y install --no-install-recommends "$@"
+ fi
+ ;;
+ rhel)
+ if ! rpm -q "$@" > /dev/null 2>&1; then
+ pkg_mgr_update
+ ${PKG_MGR_CMD} -y install "$@"
+ fi
+ ;;
+ esac
+}
+
+# Figure out correct version of a three part version number is not passed
+find_version_from_git_tags() {
+ local variable_name=$1
+ local requested_version=${!variable_name}
+ if [ "${requested_version}" = "none" ]; then return; fi
+ local repository=$2
+ local prefix=${3:-"tags/v"}
+ local separator=${4:-"."}
+ local last_part_optional=${5:-"false"}
+ if [ "$(echo "${requested_version}" | grep -o "." | wc -l)" != "2" ]; then
+ local escaped_separator=${separator//./\\.}
+ local last_part
+ if [ "${last_part_optional}" = "true" ]; then
+ last_part="(${escaped_separator}[0-9]+)?"
+ else
+ last_part="${escaped_separator}[0-9]+"
+ fi
+ local regex="${prefix}\\K[0-9]+${escaped_separator}[0-9]+${last_part}$"
+ local version_list="$(git ls-remote --tags ${repository} | grep -oP "${regex}" | tr -d ' ' | tr "${separator}" "." | sort -rV)"
+ if [ "${requested_version}" = "latest" ] || [ "${requested_version}" = "current" ] || [ "${requested_version}" = "lts" ]; then
+ declare -g ${variable_name}="$(echo "${version_list}" | head -n 1)"
+ else
+ set +e
+ declare -g ${variable_name}="$(echo "${version_list}" | grep -E -m 1 "^${requested_version//./\\.}([\\.\\s]|$)")"
+ set -e
+ fi
+ fi
+ if [ -z "${!variable_name}" ] || ! echo "${version_list}" | grep "^${!variable_name//./\\.}$" > /dev/null 2>&1; then
+ err "Invalid ${variable_name} value: ${requested_version}\nValid values:\n${version_list}" >&2
+ exit 1
+ fi
+ echo "${variable_name}=${!variable_name}"
+}
+
+# Use semver logic to decrement a version number then look for the closest match
+find_prev_version_from_git_tags() {
+ local variable_name=$1
+ local current_version=${!variable_name}
+ local repository=$2
+ # Normally a "v" is used before the version number, but support alternate cases
+ local prefix=${3:-"tags/v"}
+ # Some repositories use "_" instead of "." for version number part separation, support that
+ local separator=${4:-"."}
+ # Some tools release versions that omit the last digit (e.g. go)
+ local last_part_optional=${5:-"false"}
+ # Some repositories may have tags that include a suffix (e.g. actions/node-versions)
+ local version_suffix_regex=$6
+ # Try one break fix version number less if we get a failure. Use "set +e" since "set -e" can cause failures in valid scenarios.
+ set +e
+ major="$(echo "${current_version}" | grep -oE '^[0-9]+' || echo '')"
+ minor="$(echo "${current_version}" | grep -oP '^[0-9]+\.\K[0-9]+' || echo '')"
+ breakfix="$(echo "${current_version}" | grep -oP '^[0-9]+\.[0-9]+\.\K[0-9]+' 2>/dev/null || echo '')"
+
+ if [ "${minor}" = "0" ] && [ "${breakfix}" = "0" ]; then
+ ((major=major-1))
+ declare -g ${variable_name}="${major}"
+ # Look for latest version from previous major release
+ find_version_from_git_tags "${variable_name}" "${repository}" "${prefix}" "${separator}" "${last_part_optional}"
+ # Handle situations like Go's odd version pattern where "0" releases omit the last part
+ elif [ "${breakfix}" = "" ] || [ "${breakfix}" = "0" ]; then
+ ((minor=minor-1))
+ declare -g ${variable_name}="${major}.${minor}"
+ # Look for latest version from previous minor release
+ find_version_from_git_tags "${variable_name}" "${repository}" "${prefix}" "${separator}" "${last_part_optional}"
+ else
+ ((breakfix=breakfix-1))
+ if [ "${breakfix}" = "0" ] && [ "${last_part_optional}" = "true" ]; then
+ declare -g ${variable_name}="${major}.${minor}"
+ else
+ declare -g ${variable_name}="${major}.${minor}.${breakfix}"
+ fi
+ fi
+ set -e
+}
+
+# Function to fetch the version released prior to the latest version
+get_previous_version() {
+ local url=$1
+ local repo_url=$2
+ local variable_name=$3
+ prev_version=${!variable_name}
+
+ output=$(curl -s "$repo_url");
+ if echo "$output" | jq -e 'type == "object"' > /dev/null; then
+ message=$(echo "$output" | jq -r '.message')
+
+ if [[ $message == "API rate limit exceeded"* ]]; then
+ echo -e "\nAn attempt to find latest version using GitHub Api Failed... \nReason: ${message}"
+ echo -e "\nAttempting to find latest version using GitHub tags."
+ find_prev_version_from_git_tags prev_version "$url" "tags/v"
+ declare -g ${variable_name}="${prev_version}"
+ fi
+ elif echo "$output" | jq -e 'type == "array"' > /dev/null; then
+ echo -e "\nAttempting to find latest version using GitHub Api."
+ version=$(echo "$output" | jq -r '.[1].tag_name')
+ declare -g ${variable_name}="${version#v}"
+ fi
+ echo "${variable_name}=${!variable_name}"
+}
+
+get_github_api_repo_url() {
+ local url=$1
+ echo "${url/https:\/\/github.com/https:\/\/api.github.com\/repos}/releases"
+}
+
+###########################################
+# Start docker-in-docker installation
+###########################################
+
+# Ensure apt is in non-interactive to avoid prompts
+export DEBIAN_FRONTEND=noninteractive
+
+# Source /etc/os-release to get OS info
+. /etc/os-release
+
+# Determine adjusted ID and package manager
+if [ "${ID}" = "debian" ] || [ "${ID_LIKE}" = "debian" ]; then
+ ADJUSTED_ID="debian"
+ PKG_MGR_CMD="apt-get"
+ # Use dpkg for Debian-based systems
+ architecture="$(dpkg --print-architecture 2>/dev/null || uname -m)"
+elif [[ "${ID}" = "rhel" || "${ID}" = "fedora" || "${ID}" = "azurelinux" || "${ID}" = "mariner" || "${ID_LIKE}" = *"rhel"* || "${ID_LIKE}" = *"fedora"* || "${ID_LIKE}" = *"azurelinux"* || "${ID_LIKE}" = *"mariner"* ]]; then
+ ADJUSTED_ID="rhel"
+ # Determine the appropriate package manager for RHEL-based systems
+ for pkg_mgr in tdnf dnf microdnf yum; do
+ if command -v "$pkg_mgr" >/dev/null 2>&1; then
+ PKG_MGR_CMD="$pkg_mgr"
+ break
+ fi
+ done
+
+ if [ -z "${PKG_MGR_CMD}" ]; then
+ err "Unable to find a supported package manager (tdnf, dnf, microdnf, yum)"
+ exit 1
+ fi
+
+ architecture="$(rpm --eval '%{_arch}' 2>/dev/null || uname -m)"
+else
+ err "Linux distro ${ID} not supported."
+ exit 1
+fi
+
+# Azure Linux specific setup
+if [ "${ID}" = "azurelinux" ]; then
+ VERSION_CODENAME="azurelinux${VERSION_ID}"
+fi
+
+# Prevent attempting to install Moby on Debian trixie (packages removed)
+if [ "${USE_MOBY}" = "true" ] && [ "${ID}" = "debian" ] && [ "${VERSION_CODENAME}" = "trixie" ]; then
+ err "The 'moby' option is not supported on Debian 'trixie' because 'moby-cli' and related system packages have been removed from that distribution."
+ err "To continue, either set the feature option '\"moby\": false' or use a different base image (for example: 'debian:bookworm' or 'ubuntu-24.04')."
+ exit 1
+fi
+
+# Check if distro is supported
+if [ "${USE_MOBY}" = "true" ]; then
+ if [ "${ADJUSTED_ID}" = "debian" ]; then
+ if [[ "${DOCKER_MOBY_ARCHIVE_VERSION_CODENAMES}" != *"${VERSION_CODENAME}"* ]]; then
+ err "Unsupported distribution version '${VERSION_CODENAME}'. To resolve, either: (1) set feature option '\"moby\": false' , or (2) choose a compatible OS distribution"
+ err "Supported distributions include: ${DOCKER_MOBY_ARCHIVE_VERSION_CODENAMES}"
+ exit 1
+ fi
+ echo "(*) ${VERSION_CODENAME} is supported for Moby installation - setting up Microsoft repository"
+ elif [ "${ADJUSTED_ID}" = "rhel" ]; then
+ if [ "${ID}" = "azurelinux" ] || [ "${ID}" = "mariner" ]; then
+ echo " (*) ${ID} ${VERSION_ID} detected - using Microsoft repositories for Moby packages"
+ else
+ echo "RHEL-based system (${ID}) detected - Moby packages may require additional configuration"
+ fi
+ fi
+else
+ if [ "${ADJUSTED_ID}" = "debian" ]; then
+ if [[ "${DOCKER_LICENSED_ARCHIVE_VERSION_CODENAMES}" != *"${VERSION_CODENAME}"* ]]; then
+ err "Unsupported distribution version '${VERSION_CODENAME}'. To resolve, please choose a compatible OS distribution"
+ err "Supported distributions include: ${DOCKER_LICENSED_ARCHIVE_VERSION_CODENAMES}"
+ exit 1
+ fi
+ echo "(*) ${VERSION_CODENAME} is supported for Docker CE installation (supported: ${DOCKER_LICENSED_ARCHIVE_VERSION_CODENAMES}) - setting up Docker repository"
+ elif [ "${ADJUSTED_ID}" = "rhel" ]; then
+
+ echo "RHEL-based system (${ID}) detected - using Docker CE packages"
+ fi
+fi
+
+# Install base dependencies
+base_packages="curl ca-certificates pigz iptables gnupg2 wget jq"
+case ${ADJUSTED_ID} in
+ debian)
+ check_packages apt-transport-https $base_packages dirmngr
+ ;;
+ rhel)
+ check_packages $base_packages tar gawk shadow-utils policycoreutils procps-ng systemd-libs systemd-devel
+
+ ;;
+esac
+
+# Install git if not already present
+if ! command -v git >/dev/null 2>&1; then
+ check_packages git
+fi
+
+# Update CA certificates to ensure HTTPS connections work properly
+# This is especially important for Ubuntu 24.04 (Noble) and Debian Trixie
+# Only run for Debian-based systems (RHEL uses update-ca-trust instead)
+if [ "${ADJUSTED_ID}" = "debian" ] && command -v update-ca-certificates > /dev/null 2>&1; then
+ update-ca-certificates
+fi
+
+# Swap to legacy iptables for compatibility (Debian only)
+if [ "${ADJUSTED_ID}" = "debian" ] && type iptables-legacy > /dev/null 2>&1; then
+ update-alternatives --set iptables /usr/sbin/iptables-legacy
+ update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
+fi
+
+# Set up the necessary repositories
+if [ "${USE_MOBY}" = "true" ]; then
+ # Name of open source engine/cli
+ engine_package_name="moby-engine"
+ cli_package_name="moby-cli"
+
+ case ${ADJUSTED_ID} in
+ debian)
+ # Import key safely and import Microsoft apt repo
+ {
+ curl -sSL ${MICROSOFT_GPG_KEYS_URI}
+ curl -sSL ${MICROSOFT_GPG_KEYS_ROLLING_URI}
+ } | gpg --dearmor > /usr/share/keyrings/microsoft-archive-keyring.gpg
+ echo "deb [arch=${architecture} signed-by=/usr/share/keyrings/microsoft-archive-keyring.gpg] https://packages.microsoft.com/repos/microsoft-${ID}-${VERSION_CODENAME}-prod ${VERSION_CODENAME} main" > /etc/apt/sources.list.d/microsoft.list
+ ;;
+ rhel)
+ echo "(*) ${ID} detected - checking for Moby packages..."
+
+ # Check if moby packages are available in default repos
+ if ${PKG_MGR_CMD} list available moby-engine >/dev/null 2>&1; then
+ echo "(*) Using built-in ${ID} Moby packages"
+ else
+ case "${ID}" in
+ azurelinux)
+ echo "(*) Moby packages not found in Azure Linux repositories"
+ echo "(*) For Azure Linux, Docker CE ('moby': false) is recommended"
+ err "Moby packages are not available for Azure Linux ${VERSION_ID}."
+ err "Recommendation: Use '\"moby\": false' to install Docker CE instead."
+ exit 1
+ ;;
+ mariner)
+ echo "(*) Adding Microsoft repository for CBL-Mariner..."
+ # Add Microsoft repository if packages aren't available locally
+ curl -sSL ${MICROSOFT_GPG_KEYS_URI} | gpg --dearmor > /etc/pki/rpm-gpg/microsoft.gpg
+ cat > /etc/yum.repos.d/microsoft.repo << EOF
+[microsoft]
+name=Microsoft Repository
+baseurl=https://packages.microsoft.com/repos/microsoft-cbl-mariner-2.0-prod-base/
+enabled=1
+gpgcheck=1
+gpgkey=file:///etc/pki/rpm-gpg/microsoft.gpg
+EOF
+ # Verify packages are available after adding repo
+ pkg_mgr_update
+ if ! ${PKG_MGR_CMD} list available moby-engine >/dev/null 2>&1; then
+ echo "(*) Moby packages not found in Microsoft repository either"
+ err "Moby packages are not available for CBL-Mariner ${VERSION_ID}."
+ err "Recommendation: Use '\"moby\": false' to install Docker CE instead."
+ exit 1
+ fi
+ ;;
+ *)
+ err "Moby packages are not available for ${ID}. Please use 'moby': false option."
+ exit 1
+ ;;
+ esac
+ fi
+ ;;
+ esac
+else
+ # Name of licensed engine/cli
+ engine_package_name="docker-ce"
+ cli_package_name="docker-ce-cli"
+ case ${ADJUSTED_ID} in
+ debian)
+ curl -fsSL https://download.docker.com/linux/${ID}/gpg | gpg --dearmor > /usr/share/keyrings/docker-archive-keyring.gpg
+ echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/${ID} ${VERSION_CODENAME} stable" > /etc/apt/sources.list.d/docker.list
+ ;;
+ rhel)
+ # Docker CE repository setup for RHEL-based systems
+ setup_docker_ce_repo() {
+ curl -fsSL https://download.docker.com/linux/centos/gpg > /etc/pki/rpm-gpg/docker-ce.gpg
+ cat > /etc/yum.repos.d/docker-ce.repo << EOF
+[docker-ce-stable]
+name=Docker CE Stable
+baseurl=https://download.docker.com/linux/centos/9/\$basearch/stable
+enabled=1
+gpgcheck=1
+gpgkey=file:///etc/pki/rpm-gpg/docker-ce.gpg
+skip_if_unavailable=1
+module_hotfixes=1
+EOF
+ }
+ install_azure_linux_deps() {
+ echo "(*) Installing device-mapper libraries for Docker CE..."
+ [ "${ID}" != "mariner" ] && ${PKG_MGR_CMD} -y install device-mapper-libs 2>/dev/null || echo "(*) Device-mapper install failed, proceeding"
+ echo "(*) Installing additional Docker CE dependencies..."
+ ${PKG_MGR_CMD} -y install libseccomp libtool-ltdl systemd-libs libcgroup tar xz || {
+ echo "(*) Some optional dependencies could not be installed, continuing..."
+ }
+ }
+ setup_selinux_context() {
+ if command -v getenforce >/dev/null 2>&1 && [ "$(getenforce 2>/dev/null)" != "Disabled" ]; then
+ echo "(*) Creating minimal SELinux context for Docker compatibility..."
+ mkdir -p /etc/selinux/targeted/contexts/files/ 2>/dev/null || true
+ echo "/var/lib/docker(/.*)? system_u:object_r:container_file_t:s0" >> /etc/selinux/targeted/contexts/files/file_contexts.local 2>/dev/null || true
+ fi
+ }
+
+ # Special handling for RHEL Docker CE installation
+ case "${ID}" in
+ azurelinux|mariner)
+ echo "(*) ${ID} detected"
+ echo "(*) Note: Moby packages work better on Azure Linux. Consider using 'moby': true"
+ echo "(*) Setting up Docker CE repository..."
+
+ setup_docker_ce_repo
+ install_azure_linux_deps
+
+ if [ "${USE_MOBY}" != "true" ]; then
+ echo "(*) Docker CE installation for Azure Linux - skipping container-selinux"
+ echo "(*) Note: SELinux policies will be minimal but Docker will function normally"
+ setup_selinux_context
+ else
+ echo "(*) Using Moby - container-selinux not required"
+ fi
+ ;;
+ *)
+ # Standard RHEL/CentOS/Fedora approach
+ if command -v dnf >/dev/null 2>&1; then
+ dnf config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
+ elif command -v yum-config-manager >/dev/null 2>&1; then
+ yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
+ else
+ # Manual fallback
+ setup_docker_ce_repo
+ fi
+ ;;
+ esac
+ ;;
+ esac
+fi
+
+# Refresh package database
+case ${ADJUSTED_ID} in
+ debian)
+ apt-get update
+ ;;
+ rhel)
+ pkg_mgr_update
+ ;;
+esac
+
+# Soft version matching
+if [ "${DOCKER_VERSION}" = "latest" ] || [ "${DOCKER_VERSION}" = "lts" ] || [ "${DOCKER_VERSION}" = "stable" ]; then
+ # Empty, meaning grab whatever "latest" is in apt repo
+ engine_version_suffix=""
+ cli_version_suffix=""
+else
+ case ${ADJUSTED_ID} in
+ debian)
+ # Fetch a valid version from the apt-cache (eg: the Microsoft repo appends +azure, breakfix, etc...)
+ docker_version_dot_escaped="${DOCKER_VERSION//./\\.}"
+ docker_version_dot_plus_escaped="${docker_version_dot_escaped//+/\\+}"
+ # Regex needs to handle debian package version number format: https://www.systutorials.com/docs/linux/man/5-deb-version/
+ docker_version_regex="^(.+:)?${docker_version_dot_plus_escaped}([\\.\\+ ~:-]|$)"
+ set +e # Don't exit if finding version fails - will handle gracefully
+ cli_version_suffix="=$(apt-cache madison ${cli_package_name} | awk -F"|" '{print $2}' | sed -e 's/^[ \t]*//' | grep -E -m 1 "${docker_version_regex}")"
+ engine_version_suffix="=$(apt-cache madison ${engine_package_name} | awk -F"|" '{print $2}' | sed -e 's/^[ \t]*//' | grep -E -m 1 "${docker_version_regex}")"
+ set -e
+ if [ -z "${engine_version_suffix}" ] || [ "${engine_version_suffix}" = "=" ] || [ -z "${cli_version_suffix}" ] || [ "${cli_version_suffix}" = "=" ] ; then
+ err "No full or partial Docker / Moby version match found for \"${DOCKER_VERSION}\" on OS ${ID} ${VERSION_CODENAME} (${architecture}). Available versions:"
+ apt-cache madison ${cli_package_name} | awk -F"|" '{print $2}' | grep -oP '^(.+:)?\K.+'
+ exit 1
+ fi
+ ;;
+rhel)
+ # For RHEL-based systems, use dnf/yum to find versions
+ docker_version_escaped="${DOCKER_VERSION//./\\.}"
+ set +e # Don't exit if finding version fails - will handle gracefully
+ if [ "${USE_MOBY}" = "true" ]; then
+ available_versions=$(${PKG_MGR_CMD} list --available moby-engine 2>/dev/null | grep -v "Available Packages" | awk '{print $2}' | grep -E "^${docker_version_escaped}" | head -1)
+ else
+ available_versions=$(${PKG_MGR_CMD} list --available docker-ce 2>/dev/null | grep -v "Available Packages" | awk '{print $2}' | grep -E "^${docker_version_escaped}" | head -1)
+ fi
+ set -e
+ if [ -n "${available_versions}" ]; then
+ engine_version_suffix="-${available_versions}"
+ cli_version_suffix="-${available_versions}"
+ else
+ echo "(*) Exact version ${DOCKER_VERSION} not found, using latest available"
+ engine_version_suffix=""
+ cli_version_suffix=""
+ fi
+ ;;
+ esac
+fi
+
+# Version matching for moby-buildx
+if [ "${USE_MOBY}" = "true" ]; then
+ if [ "${MOBY_BUILDX_VERSION}" = "latest" ]; then
+ # Empty, meaning grab whatever "latest" is in apt repo
+ buildx_version_suffix=""
+ else
+ case ${ADJUSTED_ID} in
+ debian)
+ buildx_version_dot_escaped="${MOBY_BUILDX_VERSION//./\\.}"
+ buildx_version_dot_plus_escaped="${buildx_version_dot_escaped//+/\\+}"
+ buildx_version_regex="^(.+:)?${buildx_version_dot_plus_escaped}([\\.\\+ ~:-]|$)"
+ set +e
+ buildx_version_suffix="=$(apt-cache madison moby-buildx | awk -F"|" '{print $2}' | sed -e 's/^[ \t]*//' | grep -E -m 1 "${buildx_version_regex}")"
+ set -e
+ if [ -z "${buildx_version_suffix}" ] || [ "${buildx_version_suffix}" = "=" ]; then
+ err "No full or partial moby-buildx version match found for \"${MOBY_BUILDX_VERSION}\" on OS ${ID} ${VERSION_CODENAME} (${architecture}). Available versions:"
+ apt-cache madison moby-buildx | awk -F"|" '{print $2}' | grep -oP '^(.+:)?\K.+'
+ exit 1
+ fi
+ ;;
+ rhel)
+ # For RHEL-based systems, try to find buildx version or use latest
+ buildx_version_escaped="${MOBY_BUILDX_VERSION//./\\.}"
+ set +e
+ available_buildx=$(${PKG_MGR_CMD} list --available moby-buildx 2>/dev/null | grep -v "Available Packages" | awk '{print $2}' | grep -E "^${buildx_version_escaped}" | head -1)
+ set -e
+ if [ -n "${available_buildx}" ]; then
+ buildx_version_suffix="-${available_buildx}"
+ else
+ echo "(*) Exact buildx version ${MOBY_BUILDX_VERSION} not found, using latest available"
+ buildx_version_suffix=""
+ fi
+ ;;
+ esac
+ echo "buildx_version_suffix ${buildx_version_suffix}"
+ fi
+fi
+
+# Install Docker / Moby CLI if not already installed
+if type docker > /dev/null 2>&1 && type dockerd > /dev/null 2>&1; then
+ echo "Docker / Moby CLI and Engine already installed."
+else
+ case ${ADJUSTED_ID} in
+ debian)
+ if [ "${USE_MOBY}" = "true" ]; then
+ # Install engine
+ set +e # Handle error gracefully
+ apt-get -y install --no-install-recommends moby-cli${cli_version_suffix} moby-buildx${buildx_version_suffix} moby-engine${engine_version_suffix}
+ exit_code=$?
+ set -e
+
+ if [ ${exit_code} -ne 0 ]; then
+ err "Packages for moby not available in OS ${ID} ${VERSION_CODENAME} (${architecture}). To resolve, either: (1) set feature option '\"moby\": false' , or (2) choose a compatible OS version (eg: 'ubuntu-24.04')."
+ exit 1
+ fi
+
+ # Install compose
+ apt-get -y install --no-install-recommends moby-compose || err "Package moby-compose (Docker Compose v2) not available for OS ${ID} ${VERSION_CODENAME} (${architecture}). Skipping."
+ else
+ apt-get -y install --no-install-recommends docker-ce-cli${cli_version_suffix} docker-ce${engine_version_suffix}
+ # Install compose
+ apt-mark hold docker-ce docker-ce-cli
+ apt-get -y install --no-install-recommends docker-compose-plugin || echo "(*) Package docker-compose-plugin (Docker Compose v2) not available for OS ${ID} ${VERSION_CODENAME} (${architecture}). Skipping."
+ fi
+ ;;
+ rhel)
+ if [ "${USE_MOBY}" = "true" ]; then
+ set +e # Handle error gracefully
+ ${PKG_MGR_CMD} -y install moby-cli${cli_version_suffix} moby-engine${engine_version_suffix}
+ exit_code=$?
+ set -e
+
+ if [ ${exit_code} -ne 0 ]; then
+ err "Packages for moby not available in OS ${ID} ${VERSION_CODENAME} (${architecture}). To resolve, either: (1) set feature option '\"moby\": false' , or (2) choose a compatible OS version."
+ exit 1
+ fi
+
+ # Install compose
+ if [ "${DOCKER_DASH_COMPOSE_VERSION}" != "none" ]; then
+ ${PKG_MGR_CMD} -y install moby-compose || echo "(*) Package moby-compose not available for ${ID} ${VERSION_CODENAME} (${architecture}). Skipping."
+ fi
+ else
+ # Special handling for Azure Linux Docker CE installation
+ if [ "${ID}" = "azurelinux" ] || [ "${ID}" = "mariner" ]; then
+ echo "(*) Installing Docker CE on Azure Linux (bypassing container-selinux dependency)..."
+
+ # Use rpm with --force and --nodeps for Azure Linux
+ set +e # Don't exit on error for this section
+ ${PKG_MGR_CMD} -y install docker-ce${cli_version_suffix} docker-ce-cli${engine_version_suffix} containerd.io
+ install_result=$?
+ set -e
+
+ if [ $install_result -ne 0 ]; then
+ echo "(*) Standard installation failed, trying manual installation..."
+
+ echo "(*) Standard installation failed, trying manual installation..."
+
+ # Create directory for downloading packages
+ mkdir -p /tmp/docker-ce-install
+
+ # Download packages manually using curl since tdnf doesn't support download
+ echo "(*) Downloading Docker CE packages manually..."
+
+ # Get the repository baseurl
+ repo_baseurl="https://download.docker.com/linux/centos/9/x86_64/stable"
+
+ # Download packages directly
+ cd /tmp/docker-ce-install
+
+ # Get package names with versions
+ if [ -n "${cli_version_suffix}" ]; then
+ docker_ce_version="${cli_version_suffix#-}"
+ docker_cli_version="${engine_version_suffix#-}"
+ else
+ # Get latest version from repository
+ docker_ce_version="latest"
+ fi
+
+ echo "(*) Attempting to download Docker CE packages from repository..."
+
+ # Try to download latest packages if specific version fails
+ if ! curl -fsSL "${repo_baseurl}/Packages/docker-ce-${docker_ce_version}.el9.x86_64.rpm" -o docker-ce.rpm 2>/dev/null; then
+ # Fallback: try to get latest available version
+ echo "(*) Specific version not found, trying latest..."
+ latest_docker=$(curl -s "${repo_baseurl}/Packages/" | grep -o 'docker-ce-[0-9][^"]*\.el9\.x86_64\.rpm' | head -1)
+ latest_cli=$(curl -s "${repo_baseurl}/Packages/" | grep -o 'docker-ce-cli-[0-9][^"]*\.el9\.x86_64\.rpm' | head -1)
+ latest_containerd=$(curl -s "${repo_baseurl}/Packages/" | grep -o 'containerd\.io-[0-9][^"]*\.el9\.x86_64\.rpm' | head -1)
+
+ if [ -n "${latest_docker}" ]; then
+ curl -fsSL "${repo_baseurl}/Packages/${latest_docker}" -o docker-ce.rpm
+ curl -fsSL "${repo_baseurl}/Packages/${latest_cli}" -o docker-ce-cli.rpm
+ curl -fsSL "${repo_baseurl}/Packages/${latest_containerd}" -o containerd.io.rpm
+ else
+ echo "(*) ERROR: Could not find Docker CE packages in repository"
+ echo "(*) Please check repository configuration or use 'moby': true"
+ exit 1
+ fi
+ fi
+ # Install systemd libraries required by Docker CE
+ echo "(*) Installing systemd libraries required by Docker CE..."
+ ${PKG_MGR_CMD} -y install systemd-libs || ${PKG_MGR_CMD} -y install systemd-devel || {
+ echo "(*) WARNING: Could not install systemd libraries"
+ echo "(*) Docker may fail to start without these"
+ }
+
+ # Install with rpm --force --nodeps
+ echo "(*) Installing Docker CE packages with dependency override..."
+ rpm -Uvh --force --nodeps *.rpm
+
+ # Cleanup
+ cd /
+ rm -rf /tmp/docker-ce-install
+
+ echo "(*) Docker CE installation completed with dependency bypass"
+ echo "(*) Note: Some SELinux functionality may be limited without container-selinux"
+ fi
+ else
+ # Standard installation for other RHEL-based systems
+ ${PKG_MGR_CMD} -y install docker-ce${cli_version_suffix} docker-ce-cli${engine_version_suffix} containerd.io
+ fi
+ # Install compose
+ if [ "${DOCKER_DASH_COMPOSE_VERSION}" != "none" ]; then
+ ${PKG_MGR_CMD} -y install docker-compose-plugin || echo "(*) Package docker-compose-plugin not available for ${ID} ${VERSION_CODENAME} (${architecture}). Skipping."
+ fi
+ fi
+ ;;
+ esac
+fi
+
+echo "Finished installing docker / moby!"
+
+docker_home="/usr/libexec/docker"
+cli_plugins_dir="${docker_home}/cli-plugins"
+
+# fallback for docker-compose
+fallback_compose(){
+ local url=$1
+ local repo_url=$(get_github_api_repo_url "$url")
+ echo -e "\n(!) Failed to fetch the latest artifacts for docker-compose v${compose_version}..."
+ get_previous_version "${url}" "${repo_url}" compose_version
+ echo -e "\nAttempting to install v${compose_version}"
+ curl -fsSL "https://github.com/docker/compose/releases/download/v${compose_version}/docker-compose-linux-${target_compose_arch}" -o ${docker_compose_path}
+}
+
+# If 'docker-compose' command is to be included
+if [ "${DOCKER_DASH_COMPOSE_VERSION}" != "none" ]; then
+ case "${architecture}" in
+ amd64|x86_64) target_compose_arch=x86_64 ;;
+ arm64|aarch64) target_compose_arch=aarch64 ;;
+ *)
+ echo "(!) Docker in docker does not support machine architecture '$architecture'. Please use an x86-64 or ARM64 machine."
+ exit 1
+ esac
+
+ docker_compose_path="/usr/local/bin/docker-compose"
+ if [ "${DOCKER_DASH_COMPOSE_VERSION}" = "v1" ]; then
+ err "The final Compose V1 release, version 1.29.2, was May 10, 2021. These packages haven't received any security updates since then. Use at your own risk."
+ INSTALL_DOCKER_COMPOSE_SWITCH="false"
+
+ if [ "${target_compose_arch}" = "x86_64" ]; then
+ echo "(*) Installing docker compose v1..."
+ curl -fsSL "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-Linux-x86_64" -o ${docker_compose_path}
+ chmod +x ${docker_compose_path}
+
+ # Download the SHA256 checksum
+ DOCKER_COMPOSE_SHA256="$(curl -sSL "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-Linux-x86_64.sha256" | awk '{print $1}')"
+ echo "${DOCKER_COMPOSE_SHA256} ${docker_compose_path}" > docker-compose.sha256sum
+ sha256sum -c docker-compose.sha256sum --ignore-missing
+ elif [ "${VERSION_CODENAME}" = "bookworm" ]; then
+ err "Docker compose v1 is unavailable for 'bookworm' on Arm64. Kindly switch to use v2"
+ exit 1
+ else
+ # Use pip to get a version that runs on this architecture
+ check_packages python3-minimal python3-pip libffi-dev python3-venv
+ echo "(*) Installing docker compose v1 via pip..."
+ export PYTHONUSERBASE=/usr/local
+ pip3 install --disable-pip-version-check --no-cache-dir --user "Cython<3.0" pyyaml wheel docker-compose --no-build-isolation
+ fi
+ else
+ compose_version=${DOCKER_DASH_COMPOSE_VERSION#v}
+ docker_compose_url="https://github.com/docker/compose"
+ find_version_from_git_tags compose_version "$docker_compose_url" "tags/v"
+ echo "(*) Installing docker-compose ${compose_version}..."
+ curl -fsSL "https://github.com/docker/compose/releases/download/v${compose_version}/docker-compose-linux-${target_compose_arch}" -o ${docker_compose_path} || {
+ echo -e "\n(!) Failed to fetch the latest artifacts for docker-compose v${compose_version}..."
+ fallback_compose "$docker_compose_url"
+ }
+
+ chmod +x ${docker_compose_path}
+
+ # Download the SHA256 checksum
+ DOCKER_COMPOSE_SHA256="$(curl -sSL "https://github.com/docker/compose/releases/download/v${compose_version}/docker-compose-linux-${target_compose_arch}.sha256" | awk '{print $1}')"
+ echo "${DOCKER_COMPOSE_SHA256} ${docker_compose_path}" > docker-compose.sha256sum
+ sha256sum -c docker-compose.sha256sum --ignore-missing
+
+ mkdir -p ${cli_plugins_dir}
+ cp ${docker_compose_path} ${cli_plugins_dir}
+ fi
+fi
+
+# fallback method for compose-switch
+fallback_compose-switch() {
+ local url=$1
+ local repo_url=$(get_github_api_repo_url "$url")
+ echo -e "\n(!) Failed to fetch the latest artifacts for compose-switch v${compose_switch_version}..."
+ get_previous_version "$url" "$repo_url" compose_switch_version
+ echo -e "\nAttempting to install v${compose_switch_version}"
+ curl -fsSL "https://github.com/docker/compose-switch/releases/download/v${compose_switch_version}/docker-compose-linux-${target_switch_arch}" -o /usr/local/bin/compose-switch
+}
+# Install docker-compose switch if not already installed - https://github.com/docker/compose-switch#manual-installation
+if [ "${INSTALL_DOCKER_COMPOSE_SWITCH}" = "true" ] && ! type compose-switch > /dev/null 2>&1; then
+ if type docker-compose > /dev/null 2>&1; then
+ echo "(*) Installing compose-switch..."
+ current_compose_path="$(command -v docker-compose)"
+ target_compose_path="$(dirname "${current_compose_path}")/docker-compose-v1"
+ compose_switch_version="latest"
+ compose_switch_url="https://github.com/docker/compose-switch"
+ # Try to get latest version, fallback to known stable version if GitHub API fails
+ set +e
+ find_version_from_git_tags compose_switch_version "$compose_switch_url"
+ if [ $? -ne 0 ] || [ -z "${compose_switch_version}" ] || [ "${compose_switch_version}" = "latest" ]; then
+ echo "(*) GitHub API rate limited or failed, using fallback method"
+ fallback_compose-switch "$compose_switch_url"
+ fi
+ set -e
+
+ # Map architecture for compose-switch downloads
+ case "${architecture}" in
+ amd64|x86_64) target_switch_arch=amd64 ;;
+ arm64|aarch64) target_switch_arch=arm64 ;;
+ *) target_switch_arch=${architecture} ;;
+ esac
+ curl -fsSL "https://github.com/docker/compose-switch/releases/download/v${compose_switch_version}/docker-compose-linux-${target_switch_arch}" -o /usr/local/bin/compose-switch || fallback_compose-switch "$compose_switch_url"
+ chmod +x /usr/local/bin/compose-switch
+ # TODO: Verify checksum once available: https://github.com/docker/compose-switch/issues/11
+ # Setup v1 CLI as alternative in addition to compose-switch (which maps to v2)
+ mv "${current_compose_path}" "${target_compose_path}"
+ update-alternatives --install ${docker_compose_path} docker-compose /usr/local/bin/compose-switch 99
+ update-alternatives --install ${docker_compose_path} docker-compose "${target_compose_path}" 1
+ else
+ err "Skipping installation of compose-switch as docker compose is unavailable..."
+ fi
+fi
+
+# If init file already exists, exit
+if [ -f "/usr/local/share/docker-init.sh" ]; then
+ echo "/usr/local/share/docker-init.sh already exists, so exiting."
+ # Clean up
+ rm -rf /var/lib/apt/lists/*
+ exit 0
+fi
+echo "docker-init doesn't exist, adding..."
+
+if ! cat /etc/group | grep -e "^docker:" > /dev/null 2>&1; then
+ groupadd -r docker
+fi
+
+usermod -aG docker ${USERNAME}
+
+# fallback for docker/buildx
+fallback_buildx() {
+ local url=$1
+ local repo_url=$(get_github_api_repo_url "$url")
+ echo -e "\n(!) Failed to fetch the latest artifacts for docker buildx v${buildx_version}..."
+ get_previous_version "$url" "$repo_url" buildx_version
+ buildx_file_name="buildx-v${buildx_version}.linux-${target_buildx_arch}"
+ echo -e "\nAttempting to install v${buildx_version}"
+ wget https://github.com/docker/buildx/releases/download/v${buildx_version}/${buildx_file_name}
+}
+
+if [ "${INSTALL_DOCKER_BUILDX}" = "true" ]; then
+ buildx_version="latest"
+ docker_buildx_url="https://github.com/docker/buildx"
+ find_version_from_git_tags buildx_version "$docker_buildx_url" "refs/tags/v"
+ echo "(*) Installing buildx ${buildx_version}..."
+
+ # Map architecture for buildx downloads
+ case "${architecture}" in
+ amd64|x86_64) target_buildx_arch=amd64 ;;
+ arm64|aarch64) target_buildx_arch=arm64 ;;
+ *) target_buildx_arch=${architecture} ;;
+ esac
+
+ buildx_file_name="buildx-v${buildx_version}.linux-${target_buildx_arch}"
+
+ cd /tmp
+ wget https://github.com/docker/buildx/releases/download/v${buildx_version}/${buildx_file_name} || fallback_buildx "$docker_buildx_url"
+
+ docker_home="/usr/libexec/docker"
+ cli_plugins_dir="${docker_home}/cli-plugins"
+
+ mkdir -p ${cli_plugins_dir}
+ mv ${buildx_file_name} ${cli_plugins_dir}/docker-buildx
+ chmod +x ${cli_plugins_dir}/docker-buildx
+
+ chown -R "${USERNAME}:docker" "${docker_home}"
+ chmod -R g+r+w "${docker_home}"
+ find "${docker_home}" -type d -print0 | xargs -n 1 -0 chmod g+s
+fi
+
+DOCKER_DEFAULT_IP6_TABLES=""
+if [ "$DISABLE_IP6_TABLES" == true ]; then
+ requested_version=""
+ # checking whether the version requested either is in semver format or just a number denoting the major version
+ # and, extracting the major version number out of the two scenarios
+ semver_regex="^(0|[1-9][0-9]*)\.(0|[1-9][0-9]*)\.(0|[1-9][0-9]*)(-([0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*))?(\+([0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*))?$"
+ if echo "$DOCKER_VERSION" | grep -Eq $semver_regex; then
+ requested_version=$(echo $DOCKER_VERSION | cut -d. -f1)
+ elif echo "$DOCKER_VERSION" | grep -Eq "^[1-9][0-9]*$"; then
+ requested_version=$DOCKER_VERSION
+ fi
+ if [ "$DOCKER_VERSION" = "latest" ] || [[ -n "$requested_version" && "$requested_version" -ge 27 ]] ; then
+ DOCKER_DEFAULT_IP6_TABLES="--ip6tables=false"
+ echo "(!) As requested, passing '${DOCKER_DEFAULT_IP6_TABLES}'"
+ fi
+fi
+
+if [ ! -d /usr/local/share ]; then
+ mkdir -p /usr/local/share
+fi
+
+tee /usr/local/share/docker-init.sh > /dev/null \
+<< EOF
+#!/bin/sh
+#-------------------------------------------------------------------------------------------------------------
+# Copyright (c) Microsoft Corporation. All rights reserved.
+# Licensed under the MIT License. See https://go.microsoft.com/fwlink/?linkid=2090316 for license information.
+#-------------------------------------------------------------------------------------------------------------
+
+set -e
+
+AZURE_DNS_AUTO_DETECTION=${AZURE_DNS_AUTO_DETECTION}
+DOCKER_DEFAULT_ADDRESS_POOL=${DOCKER_DEFAULT_ADDRESS_POOL}
+DOCKER_DEFAULT_IP6_TABLES=${DOCKER_DEFAULT_IP6_TABLES}
+EOF
+
+tee -a /usr/local/share/docker-init.sh > /dev/null \
+<< 'EOF'
+dockerd_start="AZURE_DNS_AUTO_DETECTION=${AZURE_DNS_AUTO_DETECTION} DOCKER_DEFAULT_ADDRESS_POOL=${DOCKER_DEFAULT_ADDRESS_POOL} DOCKER_DEFAULT_IP6_TABLES=${DOCKER_DEFAULT_IP6_TABLES} $(cat << 'INNEREOF'
+ # explicitly remove dockerd and containerd PID file to ensure that it can start properly if it was stopped uncleanly
+ find /run /var/run -iname 'docker*.pid' -delete || :
+ find /run /var/run -iname 'container*.pid' -delete || :
+
+ # -- Start: dind wrapper script --
+ # Maintained: https://github.com/moby/moby/blob/master/hack/dind
+
+ export container=docker
+
+ if [ -d /sys/kernel/security ] && ! mountpoint -q /sys/kernel/security; then
+ mount -t securityfs none /sys/kernel/security || {
+ echo >&2 'Could not mount /sys/kernel/security.'
+ echo >&2 'AppArmor detection and --privileged mode might break.'
+ }
+ fi
+
+ # Mount /tmp (conditionally)
+ if ! mountpoint -q /tmp; then
+ mount -t tmpfs none /tmp
+ fi
+
+ set_cgroup_nesting()
+ {
+ # cgroup v2: enable nesting
+ if [ -f /sys/fs/cgroup/cgroup.controllers ]; then
+ # move the processes from the root group to the /init group,
+ # otherwise writing subtree_control fails with EBUSY.
+ # An error during moving non-existent process (i.e., "cat") is ignored.
+ mkdir -p /sys/fs/cgroup/init
+ xargs -rn1 < /sys/fs/cgroup/cgroup.procs > /sys/fs/cgroup/init/cgroup.procs || :
+ # enable controllers
+ sed -e 's/ / +/g' -e 's/^/+/' < /sys/fs/cgroup/cgroup.controllers \
+ > /sys/fs/cgroup/cgroup.subtree_control
+ fi
+ }
+
+ # Set cgroup nesting, retrying if necessary
+ retry_cgroup_nesting=0
+
+ until [ "${retry_cgroup_nesting}" -eq "5" ];
+ do
+ set +e
+ set_cgroup_nesting
+
+ if [ $? -ne 0 ]; then
+ echo "(*) cgroup v2: Failed to enable nesting, retrying..."
+ else
+ break
+ fi
+
+ retry_cgroup_nesting=`expr $retry_cgroup_nesting + 1`
+ set -e
+ done
+
+ # -- End: dind wrapper script --
+
+ # Handle DNS
+ set +e
+ cat /etc/resolv.conf | grep -i 'internal.cloudapp.net' > /dev/null 2>&1
+ if [ $? -eq 0 ] && [ "${AZURE_DNS_AUTO_DETECTION}" = "true" ]
+ then
+ echo "Setting dockerd Azure DNS."
+ CUSTOMDNS="--dns 168.63.129.16"
+ else
+ echo "Not setting dockerd DNS manually."
+ CUSTOMDNS=""
+ fi
+ set -e
+
+ if [ -z "$DOCKER_DEFAULT_ADDRESS_POOL" ]
+ then
+ DEFAULT_ADDRESS_POOL=""
+ else
+ DEFAULT_ADDRESS_POOL="--default-address-pool $DOCKER_DEFAULT_ADDRESS_POOL"
+ fi
+
+ # Start docker/moby engine
+ ( dockerd $CUSTOMDNS $DEFAULT_ADDRESS_POOL $DOCKER_DEFAULT_IP6_TABLES > /tmp/dockerd.log 2>&1 ) &
+INNEREOF
+)"
+
+sudo_if() {
+ COMMAND="$*"
+
+ if [ "$(id -u)" -ne 0 ]; then
+ sudo $COMMAND
+ else
+ $COMMAND
+ fi
+}
+
+retry_docker_start_count=0
+docker_ok="false"
+
+until [ "${docker_ok}" = "true" ] || [ "${retry_docker_start_count}" -eq "5" ];
+do
+ # Start using sudo if not invoked as root
+ if [ "$(id -u)" -ne 0 ]; then
+ sudo /bin/sh -c "${dockerd_start}"
+ else
+ eval "${dockerd_start}"
+ fi
+
+ retry_count=0
+ until [ "${docker_ok}" = "true" ] || [ "${retry_count}" -eq "5" ];
+ do
+ sleep 1s
+ set +e
+ docker info > /dev/null 2>&1 && docker_ok="true"
+ set -e
+
+ retry_count=`expr $retry_count + 1`
+ done
+
+ if [ "${docker_ok}" != "true" ] && [ "${retry_docker_start_count}" != "4" ]; then
+ echo "(*) Failed to start docker, retrying..."
+ set +e
+ sudo_if pkill dockerd
+ sudo_if pkill containerd
+ set -e
+ fi
+
+ retry_docker_start_count=`expr $retry_docker_start_count + 1`
+done
+
+# Execute whatever commands were passed in (if any). This allows us
+# to set this script to ENTRYPOINT while still executing the default CMD.
+exec "$@"
+EOF
+
+chmod +x /usr/local/share/docker-init.sh
+chown ${USERNAME}:root /usr/local/share/docker-init.sh
+
+# Clean up
+rm -rf /var/lib/apt/lists/*
+
+echo 'docker-in-docker-debian script has completed!'
diff --git a/.devcontainer/start-dind.sh b/.devcontainer/start-dind.sh
new file mode 100755
index 0000000000..17a1b64ba2
--- /dev/null
+++ b/.devcontainer/start-dind.sh
@@ -0,0 +1,29 @@
+#!/bin/bash
+# Start Docker-in-Docker with cgroup v1 nesting fix.
+# Usage: sudo bash start-dind.sh
+set -e
+
+# 1. Load iptables kernel modules (required by dockerd networking)
+modprobe ip_tables iptable_nat iptable_filter 2>/dev/null || true
+
+# 2. Fix cgroup v1 nesting for DinD
+# In a privileged container on cgroup v1, each subsystem shows the full
+# host hierarchy. Inner Docker's runc expects the container's own cgroup
+# as root. We bind-mount each subsystem to the container's own cgroup dir.
+if [ ! -f /sys/fs/cgroup/cgroup.controllers ]; then
+ SELF_CGROUP_ID=$(grep ':memory:' /proc/1/cgroup | cut -d: -f3 | sed 's|^/docker/||')
+ if [ -n "$SELF_CGROUP_ID" ]; then
+ for subsys_dir in /sys/fs/cgroup/*/; do
+ subsys_name=$(basename "$subsys_dir")
+ [ -L "/sys/fs/cgroup/$subsys_name" ] && continue
+ our_dir="$subsys_dir/docker/$SELF_CGROUP_ID"
+ if [ -d "$our_dir" ]; then
+ mount --bind "$our_dir" "$subsys_dir" 2>/dev/null || true
+ fi
+ done
+ fi
+fi
+
+# 3. Start Docker daemon via the DinD init script
+/usr/local/share/docker-init.sh
+sleep 2
diff --git a/.gitignore b/.gitignore
index 1392b48ba4..7df4b01410 100644
--- a/.gitignore
+++ b/.gitignore
@@ -59,9 +59,7 @@ _deps
# Custom
/build/
core/build/
-core/protobuf/config_server/*/*.pb.*
-core/protobuf/*/*.pb.*
-core/log_pb/*.pb.*
+*.pb.*
core/common/Version.cpp
!/Makefile
# Enterprise
@@ -90,9 +88,11 @@ plugins/all/
*.go.mod.sum
# Custom
plugin_logger.xml
+go_plugin.LOG
### E2E
/*-test/
+testcase-compose.yaml
### License
find_licenses/
@@ -106,7 +106,10 @@ license_coverage.txt
/dist/
/tags/
-### Cursor
+### IDE configs
/.cursor/
+/.claude/settings.local.json
/.claude/
/.gemini/
+.omc/
+/code-review/
diff --git a/docker/Dockerfile_development_part b/docker/Dockerfile_development_part
index 06cb0678fd..8383263a56 100644
--- a/docker/Dockerfile_development_part
+++ b/docker/Dockerfile_development_part
@@ -18,30 +18,36 @@ ARG HOST_OS=Linux
ARG VERSION=0.0.1
USER root
-WORKDIR /loongcollector
+ENV container=docker
-RUN mkdir -p /loongcollector/conf/instance_config/local
-RUN mkdir -p /loongcollector/log
-RUN mkdir -p /loongcollector/data
-RUN mkdir -p /loongcollector/run
+RUN yum update -y && yum -y install systemd initscripts && yum -y clean all && rm -fr /var/cache
-COPY --from=build /src/core/build/loongcollector /loongcollector/
+RUN mkdir -p /usr/local/loongcollector/conf/instance_config/local
+RUN mkdir -p /usr/local/loongcollector/log
+RUN mkdir -p /usr/local/loongcollector/data
+RUN mkdir -p /usr/local/loongcollector/run
+
+COPY --from=build /src/core/build/loongcollector /usr/local/loongcollector/
+COPY ./scripts/loongcollector_control.sh /usr/local/loongcollector/
COPY ./scripts/download_ebpflib.sh /tmp/
-RUN chown -R $(whoami) /loongcollector && \
- chmod 755 /loongcollector/loongcollector && \
- mkdir /loongcollector/data/checkpoint && \
- if [ `uname -m` = "x86_64" ]; then /tmp/download_ebpflib.sh /loongcollector; fi && \
+RUN chown -R $(whoami) /usr/local/loongcollector && \
+ chmod 755 /usr/local/loongcollector/loongcollector && \
+ mkdir -p /usr/local/loongcollector/data/checkpoint && \
+ if [ `uname -m` = "x86_64" ]; then /tmp/download_ebpflib.sh /usr/local/loongcollector; fi && \
rm /tmp/download_ebpflib.sh
-COPY --from=build /src/output/libGoPluginBase.so /loongcollector/
-COPY --from=build /src/example_config/quick_start/loongcollector_config.json /loongcollector/conf/instance_config/local/loongcollector_config.json
-COPY --from=build /src/core/build/go_pipeline/libGoPluginAdapter.so /loongcollector/
-COPY --from=build /src/core/build/ebpf/driver/libeBPFDriver.so /loongcollector/
+COPY --from=build /src/output/libGoPluginBase.so /usr/local/loongcollector/
+COPY --from=build /src/example_config/quick_start/loongcollector_config.json /usr/local/loongcollector/conf/instance_config/local/loongcollector_config.json
+COPY --from=build /src/core/build/go_pipeline/libGoPluginAdapter.so /usr/local/loongcollector/
+COPY --from=build /src/core/build/ebpf/driver/libeBPFDriver.so /usr/local/loongcollector/
-ENV HOST_OS=$HOST_OS
-ENV LOGTAIL_VERSION=$VERSION
+ENV HOST_OS=$HOST_OS \
+ LOONGCOLLECTOR_VERSION=$VERSION \
+ HTTP_PROBE_PORT=7953 \
+ ALIYUN_LOGTAIL_USER_DEFINED_ID=default \
+ docker_file_cache_path=checkpoint/docker_path_config.json
EXPOSE 18689
-ENTRYPOINT ["/loongcollector/loongcollector"]
+CMD ["/usr/local/loongcollector/loongcollector_control.sh", "start_and_block"]