diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md new file mode 100644 index 0000000000..5382f72554 --- /dev/null +++ b/.claude/CLAUDE.md @@ -0,0 +1,65 @@ + + + +# oh-my-claudecode - Intelligent Multi-Agent Orchestration + +You are running with oh-my-claudecode (OMC), a multi-agent orchestration layer for Claude Code. +Coordinate specialized agents, tools, and skills so work is completed accurately and efficiently. + + +- Delegate specialized work to the most appropriate agent. +- Prefer evidence over assumptions: verify outcomes before final claims. +- Choose the lightest-weight path that preserves quality. +- Consult official docs before implementing with SDKs/frameworks/APIs. + + + +Delegate for: multi-file changes, refactors, debugging, reviews, planning, research, verification. +Work directly for: trivial ops, small clarifications, single commands. +Route code to `executor` (use `model=opus` for complex work). Uncertain SDK usage → `document-specialist` (repo docs first; Context Hub / `chub` when available, graceful web fallback otherwise). + + + +`haiku` (quick lookups), `sonnet` (standard), `opus` (architecture, deep analysis). +Direct writes OK for: `~/.claude/**`, `.omc/**`, `.claude/**`, `CLAUDE.md`, `AGENTS.md`. + + + +Invoke via `/oh-my-claudecode:`. Trigger patterns auto-detect keywords. +Tier-0 workflows include `autopilot`, `ultrawork`, `ralph`, `team`, and `ralplan`. +Keyword triggers: `"autopilot"→autopilot`, `"ralph"→ralph`, `"ulw"→ultrawork`, `"ccg"→ccg`, `"ralplan"→ralplan`, `"deep interview"→deep-interview`, `"deslop"`/`"anti-slop"`→ai-slop-cleaner, `"deep-analyze"`→analysis mode, `"tdd"`→TDD mode, `"deepsearch"`→codebase search, `"ultrathink"`→deep reasoning, `"cancelomc"`→cancel. +Team orchestration is explicit via `/team`. +Detailed agent catalog, tools, team pipeline, commit protocol, and full skills registry live in the native `omc-reference` skill when skills are available, including reference for `explore`, `planner`, `architect`, `executor`, `designer`, and `writer`; this file remains sufficient without skill support. + + + +Verify before claiming completion. Size appropriately: small→haiku, standard→sonnet, large/security→opus. +If verification fails, keep iterating. + + + +Broad requests: explore first, then plan. 2+ independent tasks in parallel. `run_in_background` for builds/tests. +Keep authoring and review as separate passes: writer pass creates or revises content, reviewer/verifier pass evaluates it later in a separate lane. +Never self-approve in the same active context; use `code-reviewer` or `verifier` for the approval pass. +Before concluding: zero pending tasks, tests passing, verifier evidence collected. + + + +Hooks inject `` tags. Key patterns: `hook success: Success` (proceed), `[MAGIC KEYWORD: ...]` (invoke skill), `The boulder never stops` (ralph/ultrawork active). +Persistence: `` (7 days), `` (permanent). +Kill switches: `DISABLE_OMC`, `OMC_SKIP_HOOKS` (comma-separated). + + + +`/oh-my-claudecode:cancel` ends execution modes. Cancel when done+verified or blocked. Don't cancel if work incomplete. + + + +State: `.omc/state/`, `.omc/state/sessions/{sessionId}/`, `.omc/notepad.md`, `.omc/project-memory.json`, `.omc/plans/`, `.omc/research/`, `.omc/logs/` + + +## Setup + +Say "setup omc" or run `/oh-my-claudecode:omc-setup`. + + diff --git a/.claude/rules/project-knowledge/config-pitfalls.mdc b/.claude/rules/project-knowledge/config-pitfalls.mdc new file mode 100644 index 0000000000..2aa3c16b0c --- /dev/null +++ b/.claude/rules/project-knowledge/config-pitfalls.mdc @@ -0,0 +1,41 @@ +--- +description: LoongCollector 采集配置常见陷阱。编写或审查 pipeline config YAML 时参考。 +globs: + - "**/*.feature" + - "**/case.feature" + - "core/config/**" + - "test/e2e/**" +alwaysApply: false +--- +# LoongCollector 采集配置陷阱 + +## ExcutionTimeout 使配置变为一次性(onetime) + +`global.ExcutionTimeout` 存在于配置中时,**整个配置**被标记为 onetime 类型。 +只有注册了 `RegisterOnetimeInputCreator` 的插件才能在 onetime 配置中使用。 + +大部分输入插件(`input_forward`, `input_file`, `input_container_stdio`, `input_prometheus` 等)只注册了 `RegisterContinuousInputCreator`,在 onetime 配置中会报错: + +``` +failed to parse config:unsupported input plugin module:input_forward +``` + +### 判断逻辑 + +``` +global.ExcutionTimeout 存在 + → PipelineConfig::GetExpireTimeIfOneTime → mOnetimeExpireTime 被设置 + → CollectionConfig::IsOnetime() == true + → IsValidNativeInputPlugin(name, true) 在 ONETIME 注册表查找 + → 找不到 → "unsupported input plugin" +``` + +### 支持 onetime 的输入插件 + +查看 `PluginRegistry::LoadStaticPlugins()` 中调用 `RegisterOnetimeInputCreator` 的插件,如 `InputStaticFile`。 + +### 规则 + +- **持续运行的输入插件配置中不要使用 `ExcutionTimeout`** +- E2E 测试不需要 `ExcutionTimeout` 来控制超时,Go test 的 `-timeout` 参数已经提供了保护 +- 如果确实需要一次性采集,使用 `onetime_pipeline_config` 目录 + 支持 onetime 的输入插件 diff --git a/.claude/settings.json b/.claude/settings.json new file mode 100644 index 0000000000..c6802dc44e --- /dev/null +++ b/.claude/settings.json @@ -0,0 +1,5 @@ +{ + "enabledPlugins": { + "oh-my-claudecode@omc": true + } +} diff --git a/.claude/skills/code-review/SKILL.md b/.claude/skills/code-review/SKILL.md new file mode 100644 index 0000000000..76aca88ac1 --- /dev/null +++ b/.claude/skills/code-review/SKILL.md @@ -0,0 +1,451 @@ +--- +name: code-review +description: 在进行 Code Review 时,使用这个技能对 LoongCollector 变更进行安全导向、架构一致性优先的深度代码评审。 +metadata: + requires: + bins: + - python3 + - git + - gh +--- +# Code Review Agent Skill + +你是 LoongCollector 项目的高级代码审查助手。你的核心目标是发现真实缺陷、行为回归和风险点,而不是给出泛泛建议。 + +为避免假阳性,必须遵守: + +- 分析问题时必须包含充分上下文,不能只看局部 diff 就下结论。 +- 结论必须基于实际读取到的代码与变更,不允许基于记忆或猜测。 +- 先理解作者意图和端到端流程,再给出问题判断。 +- 遵循以下执行步骤,以实现代码修改后可以针对增量 Review,检查既有评审的修复情况。 + +## TOC + +- [Preflight(确保依赖工具存在)](#preflight确保依赖工具存在) +- [Local Branch Sync(确保代码新鲜)](#local-branch-sync确保代码新鲜) +- [Review Plan(开始前规划,避免遗漏)](#review-plan开始前规划避免遗漏) +- [脚本失败降级策略](#脚本失败降级策略) +- [Phase 1: Review Workspace & Incremental State(评审工作区与增量状态)](#phase-1-review-workspace--incremental-state评审工作区与增量状态) +- [Phase 2: Context Building(全局认知)](#phase-2-context-building全局认知) +- [Phase 3: Intent Analysis(意图理解)](#phase-3-intent-analysis意图理解) +- [牢记评估标准(无需输出)](#牢记评估标准无需输出) +- [Phase 4: Sub-agent Review(专项检查)](#phase-4-sub-agent-review专项检查) +- [Phase 5: Final Report(最终输出)](#phase-5-final-report最终输出) + +## Preflight(确保依赖工具存在) + +在进入 Phase 1 前,必须先执行以下命令并全部通过: + +- `python3 --version` +- `git rev-parse --is-inside-work-tree` +- `gh auth status` + +若任一命令失败,必须停止后续评审步骤,并按 `references/failure-playbook.md` 修复后重试。 + +## Local Branch Sync(确保代码新鲜) + +当复用本地 PR 分支做评审时,请在正式评审前先同步一次代码,避免使用过期工作副本: + +1. 读取远程 PR 当前 `headRefOid`(或分支当前 `HEAD` SHA)。 +2. 对应本地分支执行同步(如 `git fetch` + `git pull --ff-only` 或等价流程)。 +3. 在 `final-report.md` 顶部记录本轮评审使用的 `head` SHA,便于追溯。 + +## Review Plan(开始前规划,避免遗漏) + +在进入 Phase 1 细节步骤前,先在评审目录生成并维护 `review-plan.md`,用于“逐步执行 + 勾选校验”: + +1. 文件路径: + - PR:`code-review/pr-/review-plan.md` + - 分支:`code-review/branch-/review-plan.md` +2. 至少包含: + - 本轮评审对象(PR/分支、base/head SHA) + - 本轮待办清单(checkbox),按“**大项 + 子项**”拆分 + - 当前阶段标记(`in_progress`) + - 阻塞项与降级记录(若有) +3. 执行要求: + - 每完成一个步骤,必须同步勾选; + - 若中断或切换策略(如 `incremental -> full`),必须先更新计划再继续。 + - 不允许只写 Phase 名称而不拆子项(例如“Phase 1”必须细分到拉评论、更新状态、映射决策等子项)。 +4. 模板使用: + - `references/review-plan.template.md` 仅提供骨架; + - agent 必须根据本轮实际情况自行填写大项与子项。 + +## 脚本失败降级策略 + +若执行脚本报错,允许进入降级评审模式继续完成代码评审,但必须执行以下动作用于持续优化 skill: + +- 在 `code-review//script-failures.md` 记录失败信息(脚本名、命令、错误摘要、触发时间、回退策略)。 +- 评审继续时一律切换到 `full` 全量评审,并人工核对关键状态文件。 +- 在 `final-report.md` 增加 “Script Failure Feedback” 小节,说明失败影响范围与人工补偿动作。 +- 将失败信息反馈到技能维护通道(可用时使用 `mcp-feedback-enhanced`,不可用时至少落盘到 `script-failures.md` 供后续回收)。 + +## Phase 1: Review Workspace & Incremental State(评审工作区与增量状态) + +开始评审前,先初始化或复用仓库根目录下的评审工作区: + +- PR 评审目录:`code-review/pr-/` +- 分支评审目录:`code-review/branch-/` +- 目录不存在时必须创建,且保留历史评审轮次 + +该目录至少包含以下文件: + +- `meta.json`:评审对象与基线元数据(repo、base/head、review 时间、策略参数) +- `review-plan.md`:本轮执行计划与勾选进度(先计划再执行) +- `reviewed_commits.json`:已评审 commit 集合与映射记录 +- `intent-architecture-notes.md`:代码理解文档(Phase 3) +- `final-report.md`:最终报告(Phase 5) +- `comments/review-comments.json`:PR review comments 原始快照(仅此来源) +- `comments/comment-status.json`:评论状态判定结果(流程状态 + 技术状态) + +输入门禁: + +- 首次运行: + - 允许上述文件不存在; + - 必须先执行初始化脚本生成最小文件骨架,再继续后续步骤。 +- 非首次运行: + - 关键输入文件必须存在且 schema 合法; + - 若不合法,必须按 `references/failure-playbook.md` 执行“全量重建/重抓取”恢复流程,不允许手工拼接 JSON 继续运行。 + +模板与脚本目录(必须使用): + +- JSON 模板:`.claude/skills/code-review/references/` +- 流程脚本:`.claude/skills/code-review/scripts/` + +执行步骤(必须按顺序): + +1. 初始化评审目录与基础文件: + - PR:`python3 .claude/skills/code-review/scripts/init_review_workspace.py --repo-root --target-type pr --target-id --base-ref --head-ref --base-sha --head-sha ` + - 分支:`python3 .claude/skills/code-review/scripts/init_review_workspace.py --repo-root --target-type branch --target-id --base-ref --head-ref --base-sha --head-sha ` +2. 生成/更新 `review-plan.md`(可基于 `references/review-plan.template.md` 骨架,但必须补齐本轮大项/子项),并将当前阶段标记为 `Phase 1 in_progress`。 +3. 拉取 review comments 到 `comments/review-comments.json`: + - PR 评审:必须运行 `python3 .claude/skills/code-review/scripts/fetch_review_comments.py --repo-root --target-type pr --target-id `,仅 `PR review comments` + - 分支评审:可为空,或导入分支评审评论快照 + - `review-comments.json` 必须是标准对象结构(根对象含 `comments` 数组,元素含 `comment_id/path/line/side/body`);若不满足,视为上游脚本错误,必须先修正上游脚本。 + - 评论项必须包含 `thread_resolved` 布尔字段;流程状态仅由该字段决定(`true -> resolved`,`false -> open`)。 + - `snapshot/` 必须保留源码相对路径层级,禁止平铺文件名。示例:`snapshot/round-2/files/core/ebpf/protocol/redis/RedisParser.cpp`。若出现平铺结果,视为快照脚本错误或中途中断,必须重跑修正。 +4. 生成/更新评论状态文件: + - PR:`python3 .claude/skills/code-review/scripts/update_comment_status.py --repo-root --target-type pr --target-id ` + - 分支:`python3 .claude/skills/code-review/scripts/update_comment_status.py --repo-root --target-type branch --target-id ` + - 说明:这一步只同步结构与流程状态(`status_flow`)并保留历史 `status_tech`,不会自动做代码复核判定。 +5. 生成双维状态 Markdown 报告(表格): + - PR:`python3 .claude/skills/code-review/scripts/generate_comment_status_report.py --repo-root --target-type pr --target-id ` + - 分支:`python3 .claude/skills/code-review/scripts/generate_comment_status_report.py --repo-root --target-type branch --target-id ` + - 输出文件固定为:`comments/comment-status.md`(列:评论时间、文件、行号、作者、评论、流程状态、技术状态) +6. 计算增量映射与回退建议(`--base` 与 `--head` 必须传 commit SHA): + - PR:`python3 .claude/skills/code-review/scripts/incremental_review_mapper.py --repo-root --target-type pr --target-id --base --head --review-round ` + - 分支:`python3 .claude/skills/code-review/scripts/incremental_review_mapper.py --repo-root --target-type branch --target-id --base --head --review-round ` + - 当 `snapshot/latest.json` 存在时,映射脚本会计算 `snapshot_match_rate`,用于 rebase 冲突调整或 squash 合并后的增量决策辅助。 +7. 根据脚本输出中的 `recommendation` 执行: + - `incremental`:只评审 `need_review_commits` + - `partial`:优先评审 `need_review_commits`,并补审低置信 hunk + - `full`:执行全量评审,但必须做历史意见去重 + +8. 技术状态(`status_tech`)必须逐条复核,不允许猜测: + - 必读输入(按顺序): + 1) `comments/review-comments.json` + 2) `comments/comment-status.json` + 3) `reviewed_commits.json` + 4) 当前代码中与 comment `path` 对应文件 + 5) `snapshot/` 中同路径历史快照文件(若存在) + - 逐条处理规则(按 `comment_id`): + - 仅允许更新:`status_tech`、`mapped_finding_id`、`notes` + - `status_tech` 仅可取:`fixed|not-fixed|false-positive|partially-fixed` + - `notes` 必须写明“判定证据”,至少包含:对比文件、关键代码变化、结论原因 + - 每轮必须优先复核上一轮未终态条目(`not-fixed`、`partially-fixed`)。 + - 人工手动订正(支持): + - 若评论作者本人(当前 `gh` 登录账号)在该评论线程回复文本包含 `fixed`,状态同步为 `fixed`。 + - 若回复文本包含 `false-positive`(或 `false positive`),状态同步为 `false-positive`。 + - 手动订正由脚本在更新 `comment-status.json` 时自动吸收,并写入 `notes`。 + - 终态跳过规则(默认开启): + - 当前 `status_tech` 为 `fixed` 或 `false-positive` 的条目,本轮默认跳过技术复核。 + - 仅在以下条件触发时重开复核: + 1) 条目 `path` 在本轮 commit 范围内再次发生修改; + 2) 条目 `status_flow` 从 `resolved` 变为非 `resolved`; + 3) 人工显式指定强制复核(按 `comment_id` 列表)。 + - 输出要求: + - 更新后的 `comments/comment-status.json` + - 重新生成 `comments/comment-status.md` +9. 本轮评审收尾后,必须生成 snapshot 供下一轮增量决策使用: + - PR:`python3 .claude/skills/code-review/scripts/build_snapshot.py --repo-root --target-type pr --target-id --base --head --review-round ` + - 分支:`python3 .claude/skills/code-review/scripts/build_snapshot.py --repo-root --target-type branch --target-id --base --head --review-round ` + - 产物:`snapshot/round-/files/*`、`snapshot/round-/manifest.json`、`snapshot/latest.json` + +状态文件字段约束(必须遵守): + +- `reviewed_commits.json` 记录: + - `commit_sha` + - `patch_id`(用于 rebase 后精确映射) + - `review_round` + - `reviewed_at` + - `hunk_fingerprints`(数组) +- `comments/comment-status.json` 记录: + - `comment_id` + - `path` / `line` / `side` + - `body` + - `snippet`(可读代码片段) + - `snippet_fingerprint`(规范化片段 hash) + - `status_flow`(`open|resolved|wont-fix|deferred`) + - `status_tech`(`fixed|not-fixed|false-positive|partially-fixed`) + - `mapped_finding_id` + +说明: + +- `snippet_fingerprint` 定义为“规范化代码片段 + 文件路径 + 评论定位三元组(line/side/comment_id)”的稳定 hash,不能只用行号。 +- 允许人工修正 `status_flow` 与 `status_tech`,但不得删除历史记录。 + +增量评审策略(必须执行): + +1. 优先读取 `reviewed_commits.json`,只评审未覆盖的新变更。 +2. 若检测到 rebase/force-push,不可直接判定全量重审,先做映射再决策: + - L1(高置信):按 `patch-id` 映射旧 commit -> 新 commit,命中后继承“已评审”状态。 + - L2(中置信):按 `path + 规范化 hunk 片段 + hunk 上下文` 做指纹匹配,仅补审未命中 hunk。 + - L3(低置信):命中率低或冲突改写明显时,回退全量评审。 +3. 置信度门槛默认: + - `commit_map_rate >= 90%`:增量通过 + - `hunk_match_rate >= 80%`:局部补审 + - 否则全量回退 +4. 即使全量回退,也必须复用历史评论与 finding 去重,避免重复意见。 + +snapshot 在增量决策中的职责(必须遵守): + +1. `snapshot` 是增量决策辅助依据,不替代 git 主链路(`patch-id`/`hunk`)。 +2. rebase 且发生冲突改写时,若 commit/hunk 映射不足,可使用 `snapshot_match_rate` 辅助从 `full` 降到 `partial`。 +3. squash 合并导致 commit 边界丢失时,`snapshot_match_rate` 用于判断是否可继续增量评审。 +4. 若 `snapshot_match_rate` 不足阈值,仍必须 `full` 全量评审。 + +## Phase 2: Context Building(全局认知) + +开始评审前,必须先完成以下步骤: + +1. 读取 `../project-knowledge/SKILL.md`,建立系统架构和模块职责认知。 +2. 读取 `../project-knowledge/SKILL.md`,优先吸收: + - 公共能力入口(必须复用的 common/helper) + - 生命周期与资源释放不变量 + - 配置/环境变量约定(兼容大小写、默认值、废弃参数映射) + - 历史 review 高频问题(作为优先检查清单) +3. 读取并参考以下规范(按变更涉及范围选择): + - `../selfmonitor/SKILL.md`(自监控与告警相关改动必读) + - `../security-check/SKILL.md`(安全与合规相关改动必读) + - `../compile/SKILL.md`(涉及构建/编译链路时必读) +4. 基于 PR/分支变更列表,读取受影响文件的完整上下文(至少覆盖变更函数、调用方、定义处)。 +5. 若改动涉及 pipeline/runner/配置系统,必须先阅读以下代码再下结论: + - `core/application/Application.cpp`(主循环、配置扫描、退出顺序) + - `core/collection_pipeline/CollectionPipelineManager.cpp` + - `core/collection_pipeline/CollectionPipeline.cpp` + - `core/runner/ProcessorRunner.cpp` + - `core/runner/FlusherRunner.cpp` + - `core/config/watcher/PipelineConfigWatcher.cpp` + - `core/config/OnetimeConfigInfoManager.cpp` + - `core/file_server/FileServer.cpp` + - `core/file_server/checkpoint/CheckPointManager.cpp` + - `core/file_server/checkpoint/CheckpointManagerV2.cpp`(改动涉及 exactly-once 时) +6. 通过 MCP/`gh` 工具拉取评审上下文: + - PR 描述、提交历史、PR review comments、CI 状态 + - 最近约 10 个相关 PR 的 review 评论(提炼团队偏好) +7. 若可访问 Code 平台历史评论,优先抽样最近已合入 PR 的 review comments(建议>=30条)并做“模式交叉”: + - 把历史高频问题映射到本次变更文件,标记为“高风险检查项” + - 若与 `codebase-map` 冲突,以“最新代码事实 + 评论证据”更新结论 +8. 若发现历史约束或设计决策冲突,先记录“假设与证据”,后续在报告中显式说明。 + +## Phase 3: Intent Analysis(意图理解) + +完成上下文分析后,必须先产出“理解文档”,再进入问题列表。该文档是给开发者学习和理解代码用的,不能省略。 + +### Phase 3 输出要求(必须输出文档) + +必须输出一个独立文档(建议标题:`Code Review - Intent & Architecture Notes`),至少包含: + +- 作者意图:这个 PR/分支要解决什么问题,为什么现在做。 +- 端到端流程:从入口到出口,这次变更实际改变了哪些关键路径。 +- 影响范围:涉及哪些模块、接口、配置、状态文件、监控指标、告警链路。 +- 预期结果验证:改动是否达到目标,并给出证据与推理过程。 + +### Phase 3 落盘要求(必须写入 code-review 目录) + +必须将 Phase 3 文档写入仓库 `code-review/` 目录,禁止只在聊天中输出。 + +建议路径: + +- PR 评审:`code-review/pr-/intent-architecture-notes.md` +- 分支评审:`code-review/branch-/intent-architecture-notes.md`(`/` 替换为 `-`) + +要求: + +- 若目录不存在必须先创建。 +- 文档顶部必须包含评审对象元信息(PR号/分支名、commit范围、生成时间)。 + +### Mermaid 可视化要求(必须至少 2 张图) + +该理解文档必须包含 Mermaid 图,用于帮助学习与沟通。按改动内容选择,至少输出以下 2 类中的 2 张: + +- 架构图(模块关系 / 依赖边界) +- 流程图(关键执行路径) +- 时序图(组件交互、调用顺序、异步/重试行为) +- 数据结构图(关键状态对象、队列、checkpoint 主从关系) + +建议: + +- 小改动:至少 2 张图(流程 + 时序) +- 中大型改动:3-4 张图(架构 + 流程 + 时序 + 数据结构) + +注意: + +- 图必须与当前变更强相关,禁止画与本次 PR 无关的“百科全图”。 +- 图中节点命名使用代码中的真实组件/类型名称,避免抽象空词。 +- Mermaid 语法请遵循 `../mermaid/SKILL.md`。 + +## 牢记评估标准(无需输出) + +对每个变更文件和差异块,按以下 6 组标准检查: + +1. 业务与架构:目标达成、职责边界、拓扑与依赖、故障传播。 +2. 正确性与安全:边界检查、类型/异常处理、外部输入防御、安全合规。 +3. 并发与生命周期:线程/锁/队列正确退出、资源释放、状态恢复。 +4. 性能与资源:热路径复杂度、拷贝与分配、容量上限、日志开销。 +5. 稳定性与可观测:指标/日志/告警完整性与可定位性。 +6. 可维护性、兼容性与文档测试:可读性、向后兼容、文档与测试覆盖。 + +注意:以上不是“通用建议列表”,而是必须落到每个 sub-agent 的责任范围中执行(见下一节责任矩阵)。 + +## Phase 4: Sub-agent Review(专项检查) + +并行启动专项 sub-agent(建议 3-4 个并行,避免过度拆分)。每个 sub-agent 独立输出“发现的问题 + 证据”。 +每个 sub-agent 必须引用“牢记评估标准”中对应条目,不得只做口头判断。 +每个问题必须标注来源标准编号(例如:`[S3]` 表示“并发与生命周期”)。 + +### 责任矩阵(主责/次责) + +- Sub-agent A(逻辑与架构):主责 `S1`,次责 `S6` +- Sub-agent B(并发与生命周期):主责 `S3`,次责 `S5` +- Sub-agent C(安全稳定与性能):主责 `S2` + `S4`,次责 `S5` +- Sub-agent D(复用、兼容、文档测试):主责 `S6`,次责 `S1` + `S5` + +规则: + +- 主责标准必须全量覆盖;次责标准只需覆盖与本次改动直接相关的部分。 +- 若某问题跨多个标准,允许多标记(如 `[S2][S4]`)。 +- 不允许多个 agent 报告同一问题的重复结论;若重复,保留证据更完整的一条。 + +### Sub-agent A: 逻辑正确性与架构一致性 + +- 业务逻辑是否完整,是否存在边界漏处理、状态不一致、错误传播断裂。 +- 与 LoongCollector 架构约束是否一致(输入/处理/输出职责、Runner 模式、配置注册模式)。 +- 是否引入隐式依赖、循环依赖或故障传播不可观测的问题。 +- 重点覆盖评估标准:业务与架构、可维护性与兼容性。 + +### Sub-agent B: 并发、异步与生命周期 + +- 锁粒度、锁顺序、数据竞争、线程退出路径是否安全。 +- 回调/异步流程是否存在竞态、悬空引用、未处理失败路径。 +- 新增线程/定时任务是否可控停止,是否符合项目既有模式。 +- 重点覆盖评估标准:并发与生命周期、稳定性与可观测。 +- 生命周期/资源管理必查细则(必须逐项核对,重点是“正确释放与状态恢复”): + - 资源释放闭环: + - 每条路径(启动失败、热更新替换、删除配置、进程退出)都要核对资源闭环: + - 线程/future 可退出并被回收 + - queue pop 被 disable 后不再悬挂 + - 插件/Go pipeline 可停止且不残留引用 + - flush/batch/checkpoint 落盘语义与路径一致 + - 死锁与卡死风险: + - 锁顺序是否跨模块一致(pipeline manager / queue manager / file server)。 + - `WaitAllItemsInProcessFinished`、队列 `Wait/Trigger`、`HoldOn/Resume` 是否可能形成循环等待。 + - 长等待仅告警不终止的路径,是否可能导致永久卡住或退出超时。 + - 状态恢复正确性(核心): + - 热加载后是否恢复到“可继续采集+处理+发送”的一致状态,而非部分组件已恢复。 + - 文件采集 `Pause -> Dump -> ClearCache -> Resume` 后,handler/checkpoint/缓存三者是否一致。 + - 配置失败回滚时,旧 pipeline/task 是否保持可用,不出现半更新状态。 + - 顺序检查作为辅证(不是唯一判据): + - 仍需核对关键顺序(runner init 顺序、pipeline start/stop 顺序),但结论必须落到资源与状态结果。 + +### Sub-agent C: 安全、稳定性与性能 + +- 输入校验、异常处理、重试退避、资源释放(RAII)是否完备。 +- 右值/所有权:核验【调用点-传参-消费点】全链路,防止异常转移或冗余拷贝。 +- 是否存在热路径性能回退(重复计算、拷贝、容器增长失控、高频日志刷屏)。 +- 监控指标/告警是否完整,是否满足自监控规范。 +- 重点覆盖评估标准:正确性与安全、性能与资源、稳定性与可观测。 +- Checkpoint 必查细则(按改动范围选择): + - onetime checkpoint: + - 启动时 `LoadCheckpointFile()`,配置变化后 `DumpCheckpointFile()`。 + - 超时删除、`RemoveConfig()` 与 checkpoint 文件是否保持一致,避免残留条目导致错误恢复。 + - file checkpoint(v1): + - `FileServer::Start()` 是否仍保持 `LoadCheckPoint()` 在前、注册 handler 在后。 + - `Pause/Stop` 是否保证 `DumpCheckPointToLocal()`,以及失败场景是否有可定位日志/告警。 + - exactly-once checkpoint(v2): + - 主 checkpoint 与 range checkpoint 是否成对维护,避免孤儿 key。 + - 扫描与 GC 逻辑是否可能误删活跃 checkpoint,或导致恢复时状态不连续。 + +### Sub-agent D: 复用合规与文档一致性 + +- 是否重复实现了已有公共能力(优先复用 `core/common` 与现有工具函数)。 +- 注释与代码行为是否一致,TODO/FIXME 是否引入新技术债。 +- 插件配置或 `GetXxxParam` 改动是否同步更新 `docs/` 对应文档。 +- 重点覆盖评估标准:可维护性、兼容性与文档测试。 + +## Phase 5: Final Report(最终输出) + +Final Report 偏实用交付,可直接用于落地修复和平台流转。它与 Phase 2 的“理解文档”并行存在、互不替代。 + +### Phase 5 输出要求(实用导向) + +1. 先给 **Findings**,按严重度排序:`Critical` > `High` > `Medium` > `Low`。 +2. 每个问题必须包含可定位证据与可执行建议。 +3. 若未发现问题,明确写出“未发现阻断问题”,并列出残余风险与测试缺口。 +4. 最后补充 **Highlights**(正向实践),简洁即可。 +5. 必须包含 **Lifecycle Verdict**: + - 资源释放:`PASS/FAIL` + - 死锁/卡死风险:`PASS/FAIL` + - 状态恢复正确性:`PASS/FAIL` + - 每项附 1-3 条证据。 +6. 必须包含 **Fix Plan**(按优先级分组): + - 立即修复(阻断合入) + - 合入前修复 + - 可后续改进 +7. 必须包含 **Validation Plan**(修复后怎么验证): + - 需要跑哪些测试、观察哪些指标、验证哪些告警与恢复路径。 + +### Final Report 落盘要求(必须写入 code-review 目录) + +必须将 Final Report 写入仓库 `code-review/` 目录,禁止只在聊天中输出。 + +建议路径(与 Phase 2 同目录): + +- PR 评审:`code-review/pr-/final-report.md` +- 分支评审:`code-review/branch-/final-report.md`(`/` 替换为 `-`) + +要求: + +- `final-report.md` 必须引用对应的 `intent-architecture-notes.md`(相对路径链接)。 +- 若执行了平台发布(PR评论/Review),在文档末尾记录发布链接;若失败,记录失败原因与重试命令。 + +问题输出格式: + +```markdown +- Severity: + - File: [<路径>:<起始行号>](file://./<路径>#L<起始行号>) + - 问题: <一句话说明问题本质> + - 影响: <可能导致的错误行为/风险> + - 建议: <可直接执行的修复建议,必要时给最小代码片段> +``` + +额外要求: + +- 行号必须在最终输出前重新核对,确保可点击跳转。 +- 仅评论真实变更范围内的问题,避免“顺手重构建议”淹没核心缺陷。 +- 语气专业、直接、简洁,优先给出可验证结论。 + +### 平台发布(可选但推荐) + +若当前评审场景是 PR/分支评审,且工具可用,请在用户要求发布后自动化发布 Final Report: + +- 必须等待用户显式确认后才能执行发布。 +- 发布结构: + 1) **Inline Findings**:将可定位的问题逐条作为代码行内评论发布(不是回复到 PR 主评论)。 + 2) **PR 摘要评论**:将Final Report 摘要回复到 PR 主评论。 + - 必含:Critical/High/Medium/Low 数量统计表、Lifecycle PASS/FAIL 表格、Lifecycle FAIL 证据、总体结论、Highlights。 + - 不含:不重复贴全部 findings。 +- 发布工具: + - 优先使用 `gh` 工具提交结构化评审结果;若环境存在 GitHub MCP,可等价使用 MCP。 + - Inline 评论建议使用 `gh api repos///pulls//comments`(需包含 `commit_id/path/line/side/body`)。 + - 摘要评论建议使用 `gh pr comment --body-file `。 +- 若发布失败,必须在输出中说明失败原因并给出可复制的发布内容。 diff --git a/.claude/skills/code-review/references/comment-status.template.json b/.claude/skills/code-review/references/comment-status.template.json new file mode 100644 index 0000000000..1007f4cd19 --- /dev/null +++ b/.claude/skills/code-review/references/comment-status.template.json @@ -0,0 +1,23 @@ +{ + "version": "1.0", + "generated_at": "", + "review_target": { + "type": "pr", + "id": "" + }, + "status": [ + { + "comment_id": 0, + "path": "", + "line": 0, + "side": "RIGHT", + "body": "", + "snippet": "", + "snippet_fingerprint": "", + "status_flow": "open", + "status_tech": "not-fixed", + "mapped_finding_id": "", + "notes": "" + } + ] +} diff --git a/.claude/skills/code-review/references/failure-playbook.md b/.claude/skills/code-review/references/failure-playbook.md new file mode 100644 index 0000000000..00c2c19b5b --- /dev/null +++ b/.claude/skills/code-review/references/failure-playbook.md @@ -0,0 +1,65 @@ +# Code-Review Failure Playbook + +本文件是故障恢复决策表,目标是让 agent 在异常时做正确分流:自动恢复、回退流程、或请求人工介入。 + +## 总原则 + +- 优先判断当前是首次还是非首次。 +- 不手工拼接 JSON;恢复后必须回到标准流程节点继续执行。 +- Preflight 相关异常默认人工介入,其余优先自动回退到可重建节点。 +- 若脚本失败但不影响代码读取,允许降级继续评审,同时必须输出失败反馈记录。 + +## 场景 1:Preflight 失败(人工介入) + +- 触发信号:`python3 --version` / `git rev-parse --is-inside-work-tree` / `gh auth status` 任一失败 +- 决策:停止自动执行,提示用户介入检查环境与认证 +- 动作级别:`manual_required` +- 返回节点:Preflight(三条检查全部通过后再进入 Phase 1) + +## 场景 2:首次运行缺文件(正常入口,不是失败) + +- 触发信号:`code-review//` 不存在,或缺少 `meta.json` / `reviewed_commits.json` / `comments/*` +- 决策:判定为 Bootstrap,走初始化流程 +- 动作级别:`auto_recover` +- 返回节点:Phase 1-步骤 1(初始化)并顺序继续 + +## 场景 3:非首次运行时输入 schema 非法 + +- 触发信号:`invalid review-comments.json` / `invalid comment-status.json` +- 决策:放弃损坏中间态,回退到 Bootstrap 重建关键输入 +- 动作级别:`auto_recover` +- 返回节点:Phase 1-步骤 1(初始化)-> 步骤 2(拉取 comments)-> 步骤 3(重建状态) + +## 场景 4:commit 对象缺失 / commit 范围构建失败 + +- 触发信号:`missing base/head commit object` 或 `failed to build commit range` +- 决策:先自动同步 git 对象;若仍失败,转人工确认 base/head 选择 +- 动作级别:`auto_then_manual` +- 返回节点: + - 自动恢复成功:Phase 1-步骤 5(增量映射) + - 自动恢复失败:人工确认后重跑步骤 5 + +## 场景 5:snapshot 目录平铺 + +- 触发信号:`snapshot/` 下没有源码相对路径层级(仅平铺文件) +- 决策:视为快照过程异常,清空并重建快照 +- 动作级别:`auto_recover` +- 返回节点:快照生成步骤(完成后继续技术状态复核) + +## 场景 6:脚本运行异常但可继续评审 + +- 触发信号:任意脚本报错,但仓库代码与基础 git/gh 能力仍可读取 +- 决策:允许降级继续评审,避免流程阻塞;并强制记录失败反馈用于迭代 skill +- 动作级别:`degrade_continue` +- 必做动作: + - 写入 `code-review//script-failures.md`(脚本名、命令、错误、时间、补偿动作) + - 评审策略一律切换到 `full` 全量评审 + - 在 `final-report.md` 增加 “Script Failure Feedback” 小节 +- 返回节点:当前评审阶段(按降级策略继续) + +## 动作级别定义 + +- `manual_required`:必须人工介入后才能继续 +- `auto_recover`:agent 可自动恢复并继续流程 +- `auto_then_manual`:先自动尝试,失败后升级人工 +- `degrade_continue`:允许继续评审,但必须记录失败并反馈 diff --git a/.claude/skills/code-review/references/meta.template.json b/.claude/skills/code-review/references/meta.template.json new file mode 100644 index 0000000000..f1b824918f --- /dev/null +++ b/.claude/skills/code-review/references/meta.template.json @@ -0,0 +1,19 @@ +{ + "version": "1.0", + "repo": "", + "review_target": { + "type": "pr", + "id": "", + "base_ref": "", + "head_ref": "", + "base_sha": "", + "head_sha": "" + }, + "strategy": { + "commit_map_threshold": 0.9, + "hunk_match_threshold": 0.8, + "fallback_on_low_confidence": true + }, + "review_round": 1, + "generated_at": "" +} diff --git a/.claude/skills/code-review/references/review-plan.template.md b/.claude/skills/code-review/references/review-plan.template.md new file mode 100644 index 0000000000..81d4da2d60 --- /dev/null +++ b/.claude/skills/code-review/references/review-plan.template.md @@ -0,0 +1,30 @@ +# Review Plan + +- Review Target: `` +- Base SHA: `` +- Head SHA: `` +- Strategy: `` +- Current Phase: `` + +## Work Items + +> 请按本轮实际任务填写,不要只写 Phase 名称。 +> 建议格式:每个大项下拆 2~5 个子项,并用 checkbox 跟踪进度。 + +### + +- [ ] +- [ ] + +### + +- [ ] +- [ ] + +## Risks / Blockers + +- + +## Notes + +- 若策略切换(例如 `incremental -> full`),先更新本文件再继续执行。 diff --git a/.claude/skills/code-review/references/reviewed_commits.template.json b/.claude/skills/code-review/references/reviewed_commits.template.json new file mode 100644 index 0000000000..3d069b8334 --- /dev/null +++ b/.claude/skills/code-review/references/reviewed_commits.template.json @@ -0,0 +1,19 @@ +{ + "version": "1.0", + "review_rounds": [], + "commits": [ + { + "commit_sha": "", + "patch_id": "", + "review_round": 1, + "reviewed_at": "", + "hunk_fingerprints": [], + "files": [], + "mapping": { + "method": "direct|patch-id|hunk-similarity|none", + "mapped_from_commit": "", + "confidence": 0.0 + } + } + ] +} diff --git a/.claude/skills/code-review/scripts/build_snapshot.py b/.claude/skills/code-review/scripts/build_snapshot.py new file mode 100755 index 0000000000..f14ea55962 --- /dev/null +++ b/.claude/skills/code-review/scripts/build_snapshot.py @@ -0,0 +1,148 @@ +#!/usr/bin/env python3 +import argparse +import hashlib +import json +import re +import subprocess +from datetime import datetime, timezone +from pathlib import Path +from typing import Dict, List, Set, Tuple + + +def utc_now() -> str: + return datetime.now(timezone.utc).replace(microsecond=0).isoformat() + + +def sanitize_branch_name(branch_name: str) -> str: + return branch_name.replace("/", "-") + + +def resolve_target(args: argparse.Namespace) -> Tuple[str, str]: + if args.target_type and args.target_id: + target_type = args.target_type + target_id = args.target_id + elif args.pr_number is not None: + target_type = "pr" + target_id = str(args.pr_number) + elif args.branch_name: + target_type = "branch" + target_id = args.branch_name + else: + raise SystemExit("must provide either --target-type/--target-id or --pr-number or --branch-name") + if target_type not in {"pr", "branch"}: + raise SystemExit("target type must be pr or branch") + return target_type, target_id + + +def run_git(repo_root: Path, args: List[str]) -> str: + proc = subprocess.run(["git", *args], cwd=repo_root, text=True, capture_output=True, check=True) + return proc.stdout + + +def run_git_no_check(repo_root: Path, args: List[str]) -> subprocess.CompletedProcess: + return subprocess.run(["git", *args], cwd=repo_root, text=True, capture_output=True, check=False) + + +def normalize_file_content(text: str) -> str: + lines = [re.sub(r"\s+", " ", line.strip()) for line in text.splitlines()] + return "\n".join(lines) + + +def stable_hash(text: str) -> str: + return hashlib.sha256(text.encode("utf-8")).hexdigest() + + +def get_changed_files(repo_root: Path, base_sha: str, head_sha: str) -> List[str]: + out = run_git(repo_root, ["diff", "--name-only", f"{base_sha}..{head_sha}"]) + return sorted({line.strip() for line in out.splitlines() if line.strip()}) + + +def get_file_content_at_commit(repo_root: Path, commit_sha: str, path: str) -> str: + proc = run_git_no_check(repo_root, ["show", f"{commit_sha}:{path}"]) + if proc.returncode != 0: + return "" + return proc.stdout + + +def main() -> None: + parser = argparse.ArgumentParser(description="Build snapshot baseline for incremental review.") + parser.add_argument("--repo-root", required=True) + parser.add_argument("--target-type", choices=["pr", "branch"]) + parser.add_argument("--target-id") + parser.add_argument("--pr-number", type=int, help="PR number (legacy compatible)") + parser.add_argument("--branch-name", help="Branch name (legacy compatible)") + parser.add_argument("--base", required=True, help="Base commit SHA") + parser.add_argument("--head", required=True, help="Head commit SHA") + parser.add_argument("--review-round", required=True, type=int) + args = parser.parse_args() + + repo_root = Path(args.repo_root).resolve() + target_type, target_id_raw = resolve_target(args) + target_id_dir = sanitize_branch_name(target_id_raw) if target_type == "branch" else target_id_raw + review_dir = repo_root / "code-review" / f"{target_type}-{target_id_dir}" + + snapshot_root = review_dir / "snapshot" / f"round-{args.review_round}" + files_root = snapshot_root / "files" + files_root.mkdir(parents=True, exist_ok=True) + + changed_files = get_changed_files(repo_root, args.base, args.head) + manifest_files: List[Dict[str, object]] = [] + + for rel_path in changed_files: + content = get_file_content_at_commit(repo_root, args.head, rel_path) + if content == "": + # Deleted file at head; keep entry for audit but no content snapshot. + manifest_files.append( + {"path": rel_path, "exists_in_head": False, "raw_hash": "", "normalized_hash": "", "size": 0} + ) + continue + + out_path = files_root / rel_path + out_path.parent.mkdir(parents=True, exist_ok=True) + out_path.write_text(content, encoding="utf-8") + manifest_files.append( + { + "path": rel_path, + "exists_in_head": True, + "raw_hash": stable_hash(content), + "normalized_hash": stable_hash(normalize_file_content(content)), + "size": len(content.encode("utf-8")), + } + ) + + manifest = { + "version": "1.0", + "review_target": {"type": target_type, "id": target_id_raw}, + "review_round": args.review_round, + "base_sha": args.base, + "head_sha": args.head, + "generated_at": utc_now(), + "files": manifest_files, + } + manifest_path = snapshot_root / "manifest.json" + manifest_path.write_text(json.dumps(manifest, ensure_ascii=False, indent=2) + "\n", encoding="utf-8") + + latest = { + "latest_round": args.review_round, + "manifest": str(manifest_path.relative_to(review_dir)), + "updated_at": utc_now(), + } + latest_path = review_dir / "snapshot" / "latest.json" + latest_path.parent.mkdir(parents=True, exist_ok=True) + latest_path.write_text(json.dumps(latest, ensure_ascii=False, indent=2) + "\n", encoding="utf-8") + + print( + json.dumps( + { + "review_target": {"type": target_type, "id": target_id_raw}, + "review_round": args.review_round, + "files": len(manifest_files), + "manifest": str(manifest_path), + }, + ensure_ascii=False, + ) + ) + + +if __name__ == "__main__": + main() diff --git a/.claude/skills/code-review/scripts/fetch_review_comments.py b/.claude/skills/code-review/scripts/fetch_review_comments.py new file mode 100755 index 0000000000..2bca6fca0b --- /dev/null +++ b/.claude/skills/code-review/scripts/fetch_review_comments.py @@ -0,0 +1,204 @@ +#!/usr/bin/env python3 +import argparse +import json +import subprocess +from datetime import datetime, timezone +from pathlib import Path +from typing import Any, Dict, List, Tuple + + +def utc_now() -> str: + return datetime.now(timezone.utc).replace(microsecond=0).isoformat() + + +def run_cmd(args: List[str], cwd: Path) -> str: + proc = subprocess.run(args, cwd=cwd, text=True, capture_output=True, check=False) + if proc.returncode != 0: + raise SystemExit(f"command failed: {' '.join(args)}\n{proc.stderr.strip()}") + return proc.stdout + + +def sanitize_branch_name(branch_name: str) -> str: + return branch_name.replace("/", "-") + + +def resolve_target(args: argparse.Namespace) -> Tuple[str, str]: + if args.target_type and args.target_id: + target_type = args.target_type + target_id = args.target_id + elif args.pr_number is not None: + target_type = "pr" + target_id = str(args.pr_number) + elif args.branch_name: + target_type = "branch" + target_id = args.branch_name + else: + raise SystemExit("must provide either --target-type/--target-id or --pr-number or --branch-name") + if target_type not in {"pr", "branch"}: + raise SystemExit("target type must be pr or branch") + return target_type, target_id + + +def parse_name_with_owner(repo_root: Path) -> Tuple[str, str]: + out = run_cmd(["gh", "repo", "view", "--json", "nameWithOwner", "--jq", ".nameWithOwner"], repo_root).strip() + if "/" not in out: + raise SystemExit(f"invalid repository nameWithOwner: {out}") + owner, name = out.split("/", 1) + return owner, name + + +def get_viewer_login(repo_root: Path) -> str: + out = run_cmd(["gh", "api", "user", "--jq", ".login"], repo_root).strip() + return out + + +def run_graphql(repo_root: Path, owner: str, name: str, pr_number: int, cursor: str) -> Dict[str, Any]: + # Query review threads instead of plain review comments so we can + # persist thread-level resolution state deterministically. + query = """ +query($owner:String!, $name:String!, $number:Int!, $endCursor:String) { + repository(owner:$owner, name:$name) { + pullRequest(number:$number) { + reviewThreads(first:100, after:$endCursor) { + pageInfo { hasNextPage endCursor } + nodes { + isResolved + comments(first:100) { + nodes { + databaseId + body + path + line + originalLine + createdAt + updatedAt + author { login } + originalCommit { oid } + replyTo { databaseId } + } + } + } + } + } + } +} +""" + cmd = [ + "gh", + "api", + "graphql", + "-f", + f"query={query}", + "-F", + f"owner={owner}", + "-F", + f"name={name}", + "-F", + f"number={pr_number}", + ] + if cursor: + cmd.extend(["-F", f"endCursor={cursor}"]) + out = run_cmd(cmd, repo_root) + return json.loads(out) + + +def main() -> None: + parser = argparse.ArgumentParser(description="Fetch PR review comments to stable schema file.") + parser.add_argument("--repo-root", required=True) + parser.add_argument("--target-type", choices=["pr", "branch"]) + parser.add_argument("--target-id") + parser.add_argument("--pr-number", type=int, help="PR number (legacy compatible)") + parser.add_argument("--branch-name", help="Branch name (legacy compatible)") + args = parser.parse_args() + + repo_root = Path(args.repo_root).resolve() + target_type, target_id_raw = resolve_target(args) + target_id_dir = sanitize_branch_name(target_id_raw) if target_type == "branch" else target_id_raw + review_dir = repo_root / "code-review" / f"{target_type}-{target_id_dir}" + comments_path = review_dir / "comments" / "review-comments.json" + comments_path.parent.mkdir(parents=True, exist_ok=True) + + if target_type != "pr": + payload = { + "version": "1.0", + "source": "branch_review_comments", + "fetched_at": utc_now(), + "review_target": {"type": target_type, "id": target_id_raw}, + "comments": [], + } + comments_path.write_text(json.dumps(payload, ensure_ascii=False, indent=2) + "\n", encoding="utf-8") + print(json.dumps({"target": f"{target_type}-{target_id_raw}", "threads": 0, "comments": 0, "resolved_threads": 0})) + return + + owner, name = parse_name_with_owner(repo_root) + viewer_login = get_viewer_login(repo_root) + pr_number = int(target_id_raw) + + cursor = "" + has_next = True + comments: List[Dict[str, Any]] = [] + total_threads = 0 + resolved_threads = 0 + + while has_next: + # Paginate until all review threads are collected. + data = run_graphql(repo_root, owner, name, pr_number, cursor) + threads_obj = data["data"]["repository"]["pullRequest"]["reviewThreads"] + page_info = threads_obj["pageInfo"] + threads = threads_obj["nodes"] or [] + total_threads += len(threads) + for thread in threads: + is_resolved = bool(thread.get("isResolved", False)) + if is_resolved: + resolved_threads += 1 + thread_comments = thread.get("comments", {}).get("nodes", []) or [] + for c in thread_comments: + author = (c.get("author") or {}).get("login", "") + original_commit = (c.get("originalCommit") or {}).get("oid", "") + reply_to = (c.get("replyTo") or {}).get("databaseId") + # Use originalLine as a stable anchor because line can be null + # after code evolves on newer commits. + original_line = c.get("originalLine") + line = original_line if isinstance(original_line, int) else 0 + comments.append( + { + "comment_id": c.get("databaseId"), + "author": author, + "created_at": c.get("createdAt", ""), + "updated_at": c.get("updatedAt", ""), + "path": c.get("path", ""), + "line": line, + "side": "RIGHT", + "commit_id": original_commit, + "in_reply_to_id": reply_to, + "body": c.get("body", ""), + "thread_resolved": is_resolved, + } + ) + has_next = bool(page_info.get("hasNextPage")) + cursor = page_info.get("endCursor") if has_next else "" + + payload = { + "version": "1.0", + "source": "github_pr_review_comments", + "fetched_at": utc_now(), + "review_target": {"type": "pr", "id": target_id_raw}, + "viewer_login": viewer_login, + "comments": comments, + } + comments_path.write_text(json.dumps(payload, ensure_ascii=False, indent=2) + "\n", encoding="utf-8") + print( + json.dumps( + { + "target": f"pr-{target_id_raw}", + "threads": total_threads, + "comments": len(comments), + "resolved_threads": resolved_threads, + }, + ensure_ascii=False, + ) + ) + + +if __name__ == "__main__": + main() diff --git a/.claude/skills/code-review/scripts/generate_comment_status_report.py b/.claude/skills/code-review/scripts/generate_comment_status_report.py new file mode 100755 index 0000000000..68e3aca8f5 --- /dev/null +++ b/.claude/skills/code-review/scripts/generate_comment_status_report.py @@ -0,0 +1,110 @@ +#!/usr/bin/env python3 +import argparse +import json +from pathlib import Path +from typing import Dict, List, Tuple + + +def sanitize_branch_name(branch_name: str) -> str: + return branch_name.replace("/", "-") + + +def resolve_target(args: argparse.Namespace) -> Tuple[str, str]: + if args.target_type and args.target_id: + target_type = args.target_type + target_id = args.target_id + elif args.pr_number is not None: + target_type = "pr" + target_id = str(args.pr_number) + elif args.branch_name: + target_type = "branch" + target_id = args.branch_name + else: + raise SystemExit("must provide either --target-type/--target-id or --pr-number or --branch-name") + if target_type not in {"pr", "branch"}: + raise SystemExit("target type must be pr or branch") + return target_type, target_id + + +def read_json(path: Path) -> Dict: + if not path.exists(): + raise SystemExit(f"missing file: {path}") + return json.loads(path.read_text(encoding="utf-8")) + + +def esc_cell(text: str) -> str: + return (text or "").replace("\n", " ").replace("|", "\\|").replace("`", "").strip() + + +def build_comment_meta_map(review_comments_payload: Dict) -> Dict[int, Dict]: + comments = review_comments_payload.get("comments", []) + meta_map: Dict[int, Dict] = {} + if not isinstance(comments, list): + return meta_map + for c in comments: + cid = c.get("comment_id") + if isinstance(cid, int): + meta_map[cid] = c + return meta_map + + +def build_markdown(target_type: str, target_id: str, items: List[Dict], comment_meta: Dict[int, Dict]) -> str: + lines = [] + lines.append(f"# Comment Status Report ({target_type}-{target_id})") + lines.append("") + lines.append(f"- Total: {len(items)}") + lines.append("") + lines.append("| 评论时间 | File | Line | 作者 | Comment | Flow | Tech |") + lines.append("|---|---|---:|---|---|---|---|") + for item in items: + cid = item.get("comment_id", "") + meta = comment_meta.get(cid, {}) + created_at = esc_cell(str(meta.get("created_at", ""))) + author = esc_cell(str(meta.get("author", ""))) + path = esc_cell(str(item.get("path", ""))) + line = item.get("line", 0) + body = esc_cell(str(item.get("body", ""))) + if len(body) > 160: + body = body[:157] + "..." + status_flow = esc_cell(str(item.get("status_flow", ""))) + status_tech = esc_cell(str(item.get("status_tech", ""))) + lines.append( + f"| {created_at} | `{path}` | {line} | {author} | {body} | {status_flow} | {status_tech} |" + ) + lines.append("") + return "\n".join(lines) + + +def main() -> None: + parser = argparse.ArgumentParser(description="Generate markdown report from comment-status.json.") + parser.add_argument("--repo-root", required=True) + parser.add_argument("--target-type", choices=["pr", "branch"]) + parser.add_argument("--target-id") + parser.add_argument("--pr-number", type=int, help="PR number (legacy compatible)") + parser.add_argument("--branch-name", help="Branch name (legacy compatible)") + args = parser.parse_args() + + repo_root = Path(args.repo_root).resolve() + target_type, target_id_raw = resolve_target(args) + target_id_dir = sanitize_branch_name(target_id_raw) if target_type == "branch" else target_id_raw + review_dir = repo_root / "code-review" / f"{target_type}-{target_id_dir}" + + status_path = review_dir / "comments" / "comment-status.json" + review_comments_path = review_dir / "comments" / "review-comments.json" + report_path = review_dir / "comments" / "comment-status.md" + + payload = read_json(status_path) + if not isinstance(payload, dict) or not isinstance(payload.get("status"), list): + raise SystemExit("invalid comment-status.json: root must be object and `status` must be list") + review_comments_payload = read_json(review_comments_path) + if not isinstance(review_comments_payload, dict): + raise SystemExit("invalid review-comments.json: root must be object") + + comment_meta = build_comment_meta_map(review_comments_payload) + markdown = build_markdown(target_type, target_id_raw, payload["status"], comment_meta) + report_path.write_text(markdown + "\n", encoding="utf-8") + print(str(report_path)) + + +if __name__ == "__main__": + main() diff --git a/.claude/skills/code-review/scripts/incremental_review_mapper.py b/.claude/skills/code-review/scripts/incremental_review_mapper.py new file mode 100755 index 0000000000..d8d0a558d6 --- /dev/null +++ b/.claude/skills/code-review/scripts/incremental_review_mapper.py @@ -0,0 +1,450 @@ +#!/usr/bin/env python3 +import argparse +import hashlib +import json +import re +import subprocess +from pathlib import Path +from dataclasses import dataclass +from datetime import datetime, timezone +from typing import Dict, List, Optional, Set, Tuple + + +def run_git(repo_root: Path, args: List[str]) -> str: + result = subprocess.run( + ["git", *args], + cwd=repo_root, + text=True, + capture_output=True, + check=True, + ) + return result.stdout + + +def run_git_no_check(repo_root: Path, args: List[str]) -> subprocess.CompletedProcess: + return subprocess.run( + ["git", *args], + cwd=repo_root, + text=True, + capture_output=True, + check=False, + ) + + +def utc_now() -> str: + return datetime.now(timezone.utc).replace(microsecond=0).isoformat() + + +def normalize_code_line(line: str) -> str: + return re.sub(r"\s+", " ", line.strip()) + + +def stable_hash(text: str) -> str: + return hashlib.sha256(text.encode("utf-8")).hexdigest() + + +def normalize_file_content(text: str) -> str: + # Keep line boundaries but normalize whitespace noise for robust matching. + lines = [re.sub(r"\s+", " ", line.strip()) for line in text.splitlines()] + return "\n".join(lines) + + +def compute_patch_id(repo_root: Path, commit_sha: str) -> str: + patch_text = run_git(repo_root, ["show", "--pretty=format:", "--no-color", commit_sha]) + proc = subprocess.run( + ["git", "patch-id", "--stable"], + cwd=repo_root, + text=True, + input=patch_text, + capture_output=True, + check=True, + ) + output = proc.stdout.strip() + return output.split()[0] if output else "" + + +def get_commit_files(repo_root: Path, commit_sha: str) -> List[str]: + out = run_git(repo_root, ["show", "--pretty=format:", "--name-only", "--no-color", commit_sha]) + return sorted({line.strip() for line in out.splitlines() if line.strip()}) + + +def get_file_content_at_commit(repo_root: Path, commit_sha: str, path: str) -> Optional[str]: + proc = run_git_no_check(repo_root, ["show", f"{commit_sha}:{path}"]) + if proc.returncode != 0: + return None + return proc.stdout + + +def load_latest_snapshot_map(review_dir: Path) -> Dict[str, str]: + latest_path = review_dir / "snapshot" / "latest.json" + if not latest_path.exists(): + return {} + try: + latest = json.loads(latest_path.read_text(encoding="utf-8")) + except Exception: + return {} + manifest_rel = latest.get("manifest") + if not isinstance(manifest_rel, str) or not manifest_rel: + return {} + manifest_path = review_dir / manifest_rel + if not manifest_path.exists(): + return {} + try: + manifest = json.loads(manifest_path.read_text(encoding="utf-8")) + except Exception: + return {} + mapping: Dict[str, str] = {} + for item in manifest.get("files", []): + path = item.get("path") + n_hash = item.get("normalized_hash") + if isinstance(path, str) and isinstance(n_hash, str): + mapping[path] = n_hash + return mapping + + +def compute_snapshot_match_rate( + repo_root: Path, head_sha: str, changed_files: Set[str], snapshot_map: Dict[str, str] +) -> Optional[float]: + if not snapshot_map or not changed_files: + return None + overlap = [p for p in changed_files if p in snapshot_map] + if not overlap: + return None + matched = 0 + for path in overlap: + content = get_file_content_at_commit(repo_root, head_sha, path) + if content is None: + continue + current_hash = stable_hash(normalize_file_content(content)) + if current_hash == snapshot_map[path]: + matched += 1 + return matched / len(overlap) + + +def parse_hunk_fingerprints(repo_root: Path, commit_sha: str) -> List[str]: + patch = run_git(repo_root, ["show", "--pretty=format:", "--no-color", "-U3", commit_sha]) + lines = patch.splitlines() + file_path = "" + hunk_header = "" + hunk_lines: List[str] = [] + fps: List[str] = [] + + def flush() -> None: + nonlocal hunk_lines, hunk_header + if not hunk_lines: + return + key = file_path + "\n" + hunk_header + "\n" + "\n".join(hunk_lines) + fps.append(stable_hash(key)) + hunk_lines = [] + hunk_header = "" + + for line in lines: + if line.startswith("diff --git "): + flush() + m = re.search(r" b/(.+)$", line) + file_path = m.group(1) if m else "" + continue + if line.startswith("@@"): + flush() + hunk_header = line + continue + if line.startswith("+") or line.startswith("-"): + if line.startswith("+++") or line.startswith("---"): + continue + hunk_lines.append(normalize_code_line(line[1:])) + + flush() + return sorted(set(fps)) + + +def jaccard(a: Set[str], b: Set[str]) -> float: + if not a and not b: + return 1.0 + if not a or not b: + return 0.0 + return len(a & b) / len(a | b) + + +@dataclass +class CommitRecord: + commit_sha: str + patch_id: str + hunk_fingerprints: List[str] + review_round: int + reviewed_at: str + mapping: Dict[str, object] + + +def load_json(path: Path) -> Dict: + if not path.exists(): + return {} + return json.loads(path.read_text(encoding="utf-8")) + + +def save_json(path: Path, payload: Dict) -> None: + path.write_text(json.dumps(payload, ensure_ascii=False, indent=2) + "\n", encoding="utf-8") + + +def sanitize_branch_name(branch_name: str) -> str: + return branch_name.replace("/", "-") + + +def resolve_target(args: argparse.Namespace) -> Tuple[str, str]: + if args.target_type and args.target_id: + target_type = args.target_type + target_id = args.target_id + elif args.pr_number is not None: + target_type = "pr" + target_id = str(args.pr_number) + elif args.branch_name: + target_type = "branch" + target_id = args.branch_name + else: + raise SystemExit("must provide either --target-type/--target-id or --pr-number or --branch-name") + if target_type not in {"pr", "branch"}: + raise SystemExit("target type must be pr or branch") + return target_type, target_id + + +def ensure_commit_exists(repo_root: Path, sha: str, target_type: str, target_id: str) -> bool: + exists = run_git_no_check(repo_root, ["cat-file", "-e", f"{sha}^{{commit}}"]) + if exists.returncode == 0: + return True + + # First generic fetch to cover normal branch updates. + run_git_no_check(repo_root, ["fetch", "--all", "--prune", "--tags"]) + exists = run_git_no_check(repo_root, ["cat-file", "-e", f"{sha}^{{commit}}"]) + if exists.returncode == 0: + return True + + # Then PR-specific fetch for detached PR heads. + if target_type == "pr": + run_git_no_check(repo_root, ["fetch", "origin", f"pull/{target_id}/head"]) + exists = run_git_no_check(repo_root, ["cat-file", "-e", f"{sha}^{{commit}}"]) + if exists.returncode == 0: + return True + return False + + +def main() -> None: + parser = argparse.ArgumentParser(description="Map reviewed commits for incremental PR/branch review.") + parser.add_argument("--repo-root", required=True) + parser.add_argument("--target-type", choices=["pr", "branch"]) + parser.add_argument("--target-id") + parser.add_argument("--pr-number", type=int, help="PR number (legacy compatible)") + parser.add_argument("--branch-name", help="Branch name (legacy compatible)") + parser.add_argument("--base", required=True, help="Base commit SHA for comparison") + parser.add_argument("--head", required=True, help="Head commit SHA for comparison") + parser.add_argument("--review-round", required=True, type=int) + parser.add_argument("--commit-map-threshold", type=float, default=0.9) + parser.add_argument("--hunk-match-threshold", type=float, default=0.8) + parser.add_argument("--snapshot-match-threshold", type=float, default=0.9) + args = parser.parse_args() + + repo_root = Path(args.repo_root).resolve() + target_type, target_id_raw = resolve_target(args) + target_id_dir = sanitize_branch_name(target_id_raw) if target_type == "branch" else target_id_raw + review_dir = repo_root / "code-review" / f"{target_type}-{target_id_dir}" + reviewed_path = review_dir / "reviewed_commits.json" + reviewed = load_json(reviewed_path) or {"version": "1.0", "review_rounds": [], "commits": []} + old_commits = reviewed.get("commits", []) + snapshot_map = load_latest_snapshot_map(review_dir) + + if not ensure_commit_exists(repo_root, args.base, target_type, target_id_raw): + raise SystemExit( + f"missing base commit object: {args.base}. " + "Please fetch the base branch history, then retry." + ) + if not ensure_commit_exists(repo_root, args.head, target_type, target_id_raw): + raise SystemExit( + f"missing head commit object: {args.head}. " + "For PR review, try: git fetch origin pull//head" + ) + + try: + rev_list_output = run_git(repo_root, ["rev-list", "--reverse", f"{args.base}..{args.head}"]) + except subprocess.CalledProcessError as e: + stderr = (e.stderr or "").strip() + raise SystemExit( + f"failed to build commit range {args.base}..{args.head}: {stderr or 'unknown git error'}" + ) + + current_commits = [sha for sha in rev_list_output.splitlines() if sha] + current_set = set(current_commits) + commit_files_map: Dict[str, List[str]] = {} + current_changed_files: Set[str] = set() + for sha in current_commits: + files = get_commit_files(repo_root, sha) + commit_files_map[sha] = files + current_changed_files.update(files) + + old_by_sha = {c.get("commit_sha"): c for c in old_commits if c.get("commit_sha")} + old_by_patch_id: Dict[str, Dict] = {} + for c in old_commits: + pid = c.get("patch_id") + if pid and pid not in old_by_patch_id: + old_by_patch_id[pid] = c + + mapped: Dict[str, CommitRecord] = {} + unchanged_by_sha = 0 + for sha in current_commits: + if sha in old_by_sha: + oc = old_by_sha[sha] + mapped[sha] = CommitRecord( + commit_sha=sha, + patch_id=oc.get("patch_id", ""), + hunk_fingerprints=oc.get("hunk_fingerprints", []), + review_round=oc.get("review_round", args.review_round), + reviewed_at=oc.get("reviewed_at", utc_now()), + mapping={"method": "direct", "mapped_from_commit": sha, "confidence": 1.0}, + ) + unchanged_by_sha += 1 + + for sha in current_commits: + if sha in mapped: + continue + pid = compute_patch_id(repo_root, sha) + if pid and pid in old_by_patch_id: + oc = old_by_patch_id[pid] + mapped[sha] = CommitRecord( + commit_sha=sha, + patch_id=pid, + hunk_fingerprints=oc.get("hunk_fingerprints", []), + review_round=oc.get("review_round", args.review_round), + reviewed_at=oc.get("reviewed_at", utc_now()), + mapping={"method": "patch-id", "mapped_from_commit": oc.get("commit_sha", ""), "confidence": 0.98}, + ) + + old_unmapped = [c for c in old_commits if c.get("commit_sha") not in current_set] + old_hunk_sets = { + c.get("commit_sha", ""): set(c.get("hunk_fingerprints", [])) for c in old_unmapped if c.get("commit_sha") + } + + for sha in current_commits: + if sha in mapped: + continue + new_hunks = set(parse_hunk_fingerprints(repo_root, sha)) + best_score = 0.0 + best_old = "" + for old_sha, old_hunks in old_hunk_sets.items(): + score = jaccard(new_hunks, old_hunks) + if score > best_score: + best_score = score + best_old = old_sha + if best_old and best_score >= args.hunk_match_threshold: + mapped[sha] = CommitRecord( + commit_sha=sha, + patch_id=compute_patch_id(repo_root, sha), + hunk_fingerprints=sorted(new_hunks), + review_round=args.review_round, + reviewed_at=utc_now(), + mapping={"method": "hunk-similarity", "mapped_from_commit": best_old, "confidence": round(best_score, 4)}, + ) + + need_review: List[str] = [sha for sha in current_commits if sha not in mapped] + + # commit_map_rate measures "how many OLD commits are accounted for in the + # new commit set", NOT "what fraction of current commits are mapped". + # Denominator = old commit count (the baseline we reviewed before). + # This way appending new commits doesn't penalise the rate, while rebase + # that loses old commits correctly lowers it. + old_commits_covered: Set[str] = set() + for rec in mapped.values(): + from_sha = rec.mapping.get("mapped_from_commit", "") + if from_sha: + old_commits_covered.add(from_sha) + old_commit_count = len(old_commits) + commit_map_rate = (len(old_commits_covered) / old_commit_count) if old_commit_count > 0 else 1.0 + + if need_review: + hunk_scores: List[float] = [] + for sha in need_review: + new_hunks = set(parse_hunk_fingerprints(repo_root, sha)) + best = 0.0 + for old_hunks in old_hunk_sets.values(): + best = max(best, jaccard(new_hunks, old_hunks)) + hunk_scores.append(best) + hunk_match_rate = (sum(hunk_scores) / len(hunk_scores)) if hunk_scores else 1.0 + else: + hunk_match_rate = 1.0 + + if commit_map_rate >= args.commit_map_threshold: + recommendation = "incremental" + elif hunk_match_rate >= args.hunk_match_threshold: + recommendation = "partial" + else: + recommendation = "full" + + snapshot_match_rate = compute_snapshot_match_rate(repo_root, args.head, current_changed_files, snapshot_map) + if recommendation == "full" and snapshot_match_rate is not None and snapshot_match_rate >= args.snapshot_match_threshold: + # For squash/rebase-conflict scenarios, snapshot evidence can safely + # downgrade from full to partial. + recommendation = "partial" + + round_record = { + "review_round": args.review_round, + "generated_at": utc_now(), + "base": args.base, + "head": args.head, + "stats": { + "total_commits": len(current_commits), + "mapped_commits": len(mapped), + "direct_sha_hits": unchanged_by_sha, + "commit_map_rate": round(commit_map_rate, 4), + "hunk_match_rate": round(hunk_match_rate, 4), + "snapshot_match_rate": round(snapshot_match_rate, 4) if snapshot_match_rate is not None else None, + "recommendation": recommendation, + }, + "need_review_commits": need_review, + } + + merged_commits = [c for c in old_commits if c.get("commit_sha") not in current_set] + for sha in current_commits: + if sha in mapped: + c = mapped[sha] + merged_commits.append( + { + "commit_sha": c.commit_sha, + "patch_id": c.patch_id, + "review_round": c.review_round, + "reviewed_at": c.reviewed_at, + "hunk_fingerprints": c.hunk_fingerprints, + "files": commit_files_map.get(sha, []), + "mapping": c.mapping, + } + ) + else: + merged_commits.append( + { + "commit_sha": sha, + "patch_id": compute_patch_id(repo_root, sha), + "review_round": args.review_round, + "reviewed_at": "", + "hunk_fingerprints": parse_hunk_fingerprints(repo_root, sha), + "files": commit_files_map.get(sha, []), + "mapping": {"method": "none", "mapped_from_commit": "", "confidence": 0.0}, + } + ) + + reviewed["commits"] = merged_commits + reviewed.setdefault("review_rounds", []).append(round_record) + save_json(reviewed_path, reviewed) + + print( + json.dumps( + { + "review_target": {"type": target_type, "id": target_id_raw}, + "total_commits": len(current_commits), + "need_review_commits": need_review, + "commit_map_rate": round(commit_map_rate, 4), + "hunk_match_rate": round(hunk_match_rate, 4), + "snapshot_match_rate": round(snapshot_match_rate, 4) if snapshot_match_rate is not None else None, + "recommendation": recommendation, + }, + ensure_ascii=False, + ) + ) + + +if __name__ == "__main__": + main() diff --git a/.claude/skills/code-review/scripts/init_review_workspace.py b/.claude/skills/code-review/scripts/init_review_workspace.py new file mode 100755 index 0000000000..806c3ea481 --- /dev/null +++ b/.claude/skills/code-review/scripts/init_review_workspace.py @@ -0,0 +1,127 @@ +#!/usr/bin/env python3 +import argparse +import json +from datetime import datetime, timezone +from pathlib import Path +from typing import Any, Dict, Tuple + + +SCRIPT_DIR = Path(__file__).resolve().parent +SKILL_DIR = SCRIPT_DIR.parent +REF_DIR = SKILL_DIR / "references" + + +def utc_now() -> str: + return datetime.now(timezone.utc).replace(microsecond=0).isoformat() + + +def read_json(path: Path) -> Dict[str, Any]: + return json.loads(path.read_text(encoding="utf-8")) + + +def write_json(path: Path, data: Dict[str, Any]) -> None: + path.parent.mkdir(parents=True, exist_ok=True) + path.write_text(json.dumps(data, ensure_ascii=False, indent=2) + "\n", encoding="utf-8") + + +def ensure_file_from_template(target: Path, template: Path, mutate=None) -> None: + if target.exists(): + return + payload = read_json(template) + if mutate: + mutate(payload) + write_json(target, payload) + + +def sanitize_branch_name(branch_name: str) -> str: + return branch_name.replace("/", "-") + + +def resolve_target(args: argparse.Namespace) -> Tuple[str, str]: + if args.target_type and args.target_id: + target_type = args.target_type + target_id = args.target_id + elif args.pr_number is not None: + target_type = "pr" + target_id = str(args.pr_number) + elif args.branch_name: + target_type = "branch" + target_id = args.branch_name + else: + raise SystemExit("must provide either --target-type/--target-id or --pr-number or --branch-name") + if target_type not in {"pr", "branch"}: + raise SystemExit("target type must be pr or branch") + return target_type, target_id + + +def main() -> None: + parser = argparse.ArgumentParser(description="Initialize code-review workspace for PR or branch.") + parser.add_argument("--repo-root", required=True, help="Repository root path") + parser.add_argument("--target-type", choices=["pr", "branch"], help="Review target type") + parser.add_argument("--target-id", help="Review target id (PR number or branch name)") + parser.add_argument("--pr-number", type=int, help="PR number (legacy compatible)") + parser.add_argument("--branch-name", help="Branch name (legacy compatible)") + parser.add_argument("--base-ref", default="", help="PR base ref") + parser.add_argument("--head-ref", default="", help="PR head ref") + parser.add_argument("--base-sha", default="", help="PR base sha") + parser.add_argument("--head-sha", default="", help="PR head sha") + args = parser.parse_args() + + repo_root = Path(args.repo_root).resolve() + target_type, target_id_raw = resolve_target(args) + target_id_dir = sanitize_branch_name(target_id_raw) if target_type == "branch" else target_id_raw + review_dir = repo_root / "code-review" / f"{target_type}-{target_id_dir}" + comments_dir = review_dir / "comments" + + comments_dir.mkdir(parents=True, exist_ok=True) + + meta_path = review_dir / "meta.json" + reviewed_commits_path = review_dir / "reviewed_commits.json" + review_comments_path = comments_dir / "review-comments.json" + comment_status_path = comments_dir / "comment-status.json" + + def mutate_meta(payload: Dict[str, Any]) -> None: + payload["repo"] = str(repo_root) + payload["review_target"]["type"] = target_type + payload["review_target"]["id"] = target_id_raw + payload["review_target"]["base_ref"] = args.base_ref + payload["review_target"]["head_ref"] = args.head_ref + payload["review_target"]["base_sha"] = args.base_sha + payload["review_target"]["head_sha"] = args.head_sha + payload["generated_at"] = utc_now() + + def create_review_comments_payload() -> Dict[str, Any]: + return { + "version": "1.0", + "source": "github_pr_review_comments" if target_type == "pr" else "branch_review_comments", + "fetched_at": utc_now(), + "review_target": {"type": target_type, "id": target_id_raw}, + "viewer_login": "", + "comments": [], + } + + def mutate_comment_status(payload: Dict[str, Any]) -> None: + payload["review_target"]["type"] = target_type + payload["review_target"]["id"] = target_id_raw + payload["generated_at"] = utc_now() + payload["status"] = [] + + def mutate_reviewed_commits(payload: Dict[str, Any]) -> None: + payload["review_rounds"] = [] + payload["commits"] = [] + + ensure_file_from_template(meta_path, REF_DIR / "meta.template.json", mutate_meta) + ensure_file_from_template( + reviewed_commits_path, REF_DIR / "reviewed_commits.template.json", mutate_reviewed_commits + ) + if not review_comments_path.exists(): + write_json(review_comments_path, create_review_comments_payload()) + ensure_file_from_template( + comment_status_path, REF_DIR / "comment-status.template.json", mutate_comment_status + ) + + print(str(review_dir)) + + +if __name__ == "__main__": + main() diff --git a/.claude/skills/code-review/scripts/update_comment_status.py b/.claude/skills/code-review/scripts/update_comment_status.py new file mode 100755 index 0000000000..007490eb76 --- /dev/null +++ b/.claude/skills/code-review/scripts/update_comment_status.py @@ -0,0 +1,195 @@ +#!/usr/bin/env python3 +import argparse +import hashlib +import json +import re +from datetime import datetime, timezone +from pathlib import Path +from typing import Dict, List, Tuple + + +def utc_now() -> str: + return datetime.now(timezone.utc).replace(microsecond=0).isoformat() + + +def stable_hash(text: str) -> str: + return hashlib.sha256(text.encode("utf-8")).hexdigest() + + +def normalize_text(text: str) -> str: + text = re.sub(r"\s+", " ", text.strip()) + return text + + +def infer_flow_status_from_comment(comment: Dict) -> str: + # Single deterministic rule from upstream schema. + return "resolved" if comment.get("thread_resolved") is True else "open" + + +def read_json(path: Path) -> Dict: + if not path.exists(): + return {} + return json.loads(path.read_text(encoding="utf-8")) + + +def write_json(path: Path, payload: Dict) -> None: + path.parent.mkdir(parents=True, exist_ok=True) + path.write_text(json.dumps(payload, ensure_ascii=False, indent=2) + "\n", encoding="utf-8") + + +def default_status(comment: Dict) -> Dict: + body = comment.get("body", "") + snippet = normalize_text(body)[:300] + fingerprint_seed = "|".join( + [ + str(comment.get("path", "")), + str(comment.get("line", 0)), + str(comment.get("side", "RIGHT")), + snippet, + ] + ) + return { + "comment_id": comment.get("comment_id"), + "path": comment.get("path", ""), + "line": comment.get("line", 0), + "side": comment.get("side", "RIGHT"), + "body": body, + "snippet": snippet, + "snippet_fingerprint": stable_hash(fingerprint_seed), + "status_flow": infer_flow_status_from_comment(comment), + # status_tech is owned by model review in later phase. + "status_tech": "not-fixed", + "mapped_finding_id": "", + "notes": "", + } + + +def sanitize_branch_name(branch_name: str) -> str: + return branch_name.replace("/", "-") + + +def resolve_target(args: argparse.Namespace) -> Tuple[str, str]: + if args.target_type and args.target_id: + target_type = args.target_type + target_id = args.target_id + elif args.pr_number is not None: + target_type = "pr" + target_id = str(args.pr_number) + elif args.branch_name: + target_type = "branch" + target_id = args.branch_name + else: + raise SystemExit("must provide either --target-type/--target-id or --pr-number or --branch-name") + if target_type not in {"pr", "branch"}: + raise SystemExit("target type must be pr or branch") + return target_type, target_id + + +def validate_payload(raw: Dict) -> List[Dict]: + if not isinstance(raw, dict): + raise SystemExit("invalid review-comments.json: root must be object") + comments = raw.get("comments") + if not isinstance(comments, list): + raise SystemExit("invalid review-comments.json: `comments` must be list") + return comments + + +def validate_comment(comment: Dict) -> None: + required = ["comment_id", "path", "line", "side", "body", "thread_resolved"] + missing = [k for k in required if k not in comment] + if missing: + raise SystemExit( + "invalid review comment record: missing required fields " + + ",".join(missing) + ) + + +def infer_manual_tech_override(replies: List[Dict], viewer_login: str) -> str: + if not viewer_login: + return "" + # Prefer the latest explicit override from current reviewer account. + for reply in reversed(replies): + if str(reply.get("author", "")).lower() != viewer_login.lower(): + continue + text = normalize_text(str(reply.get("body", ""))).lower() + if "false-positive" in text or "false positive" in text or "假阳性" in text or "误判" in text: + return "false-positive" + if re.search(r"\bfixed\b", text) or "已修复" in text: + return "fixed" + return "" + + +def main() -> None: + parser = argparse.ArgumentParser(description="Build comment-status.json from review comments.") + parser.add_argument("--repo-root", required=True) + parser.add_argument("--target-type", choices=["pr", "branch"]) + parser.add_argument("--target-id") + parser.add_argument("--pr-number", type=int, help="PR number (legacy compatible)") + parser.add_argument("--branch-name", help="Branch name (legacy compatible)") + args = parser.parse_args() + + repo_root = Path(args.repo_root).resolve() + target_type, target_id_raw = resolve_target(args) + target_id_dir = sanitize_branch_name(target_id_raw) if target_type == "branch" else target_id_raw + review_dir = repo_root / "code-review" / f"{target_type}-{target_id_dir}" + comments_path = review_dir / "comments" / "review-comments.json" + status_path = review_dir / "comments" / "comment-status.json" + + comments_payload = read_json(comments_path) + if not comments_payload: + raise SystemExit(f"missing comments file: {comments_path}") + comments = validate_payload(comments_payload) + viewer_login = str(comments_payload.get("viewer_login", "")).strip() + + replies_by_parent: Dict[int, List[Dict]] = {} + root_comments: List[Dict] = [] + for c in comments: + parent = c.get("in_reply_to_id") + if parent is None: + root_comments.append(c) + else: + replies_by_parent.setdefault(parent, []).append(c) + + previous = read_json(status_path) + previous_map = {item.get("comment_id"): item for item in previous.get("status", [])} + + status: List[Dict] = [] + seen_fp = set() + for comment in root_comments: + validate_comment(comment) + cid = comment.get("comment_id") + if cid in previous_map: + item = previous_map[cid] + # Preserve manual/model edits on status_tech/notes, always sync flow status from source. + item["status_flow"] = infer_flow_status_from_comment(comment) + else: + item = default_status(comment) + + fp = item.get("snippet_fingerprint", "") + if fp and fp in seen_fp: + item["notes"] = (item.get("notes", "") + " duplicate-fingerprint").strip() + manual_override = infer_manual_tech_override(replies_by_parent.get(cid, []), viewer_login) + if manual_override: + item["status_tech"] = manual_override + item["notes"] = (item.get("notes", "") + f" manual-tech-override:{manual_override}").strip() + seen_fp.add(fp) + status.append(item) + + payload = { + "version": "1.0", + "generated_at": utc_now(), + "review_target": {"type": target_type, "id": target_id_raw}, + "status": status, + } + write_json(status_path, payload) + + print( + json.dumps( + {"review_target": {"type": target_type, "id": target_id_raw}, "status_count": len(status)}, + ensure_ascii=False, + ) + ) + + +if __name__ == "__main__": + main() diff --git a/.claude/skills/commit/SKILL.md b/.claude/skills/commit/SKILL.md new file mode 100644 index 0000000000..046d2af45b --- /dev/null +++ b/.claude/skills/commit/SKILL.md @@ -0,0 +1,40 @@ +--- +name: commit +description: Write commit messages following to Conventional Commits standards. +--- +# Commit Skill + +Generate commit messages that follow the Conventional Commits specification. + +## Format + +``` +type(scope): verb + object + +{why is this change needed, what user/system impact it brings} + +Fixes #{ISSUE_ID} +``` + +## Fields + +- **type**: `feat | fix | docs | style | refactor | perf | test | chore | revert` +- **scope**: Optional. File/module/subsystem, e.g. `api`, `ui`, `auth`, `deps` +- **subject**: <= 50 characters, imperative mood, lowercase first letter, no period +- **body**: Each line <= 72 characters. Explain "what" and "why" +- **footer**: Optional. Link Issue / PR / Breaking Change + +## Steps + +1. Collect information by reading `git diff`. Skip if user already provided context. +2. Determine the commit type based on changes. +3. If changes span multiple scopes, use the core module as scope. +4. Extract added/modified/deleted functions, classes, interfaces for the subject. +5. If breaking change, add `BREAKING CHANGE:` to footer. +6. Present the complete commit message for user confirmation before executing `git commit`. + +## Prohibited + +- No meaningless descriptions like "update code", "fix bug", "wip" +- No subject or body lines exceeding 72 characters +- No issue links in the subject line diff --git a/.claude/skills/compile/SKILL.md b/.claude/skills/compile/SKILL.md new file mode 100644 index 0000000000..26d6bc8405 --- /dev/null +++ b/.claude/skills/compile/SKILL.md @@ -0,0 +1,83 @@ +--- +name: compile +description: Building LoongCollector C++ and Go components. Use when compiling any part of the project. +--- +# Compile Skill + +## How to Compile This Project + +This project has both C++ and Go components. Use the appropriate build method based on what you modified. + +### C++ Build + +**IMPORTANT: All CMake and make commands must run from inside the `build/` directory.** Running from repo root will reconfigure incorrectly. + +**Prerequisites** — Git submodules must be populated before first build: +```bash +git submodule update --init --recursive +``` +Two submodules live under `core/_thirdparty/`: +- `DCGM` — NVIDIA DCGM headers (`dcgm_agent.h` etc.) +- `coolbpf` — eBPF framework + +If either is empty, compilation fails with `No such file or directory`. + +#### Build Steps + +```bash +mkdir -p build && cd build +cmake -DCMAKE_BUILD_TYPE=Debug -DLOGTAIL_VERSION=0.0.1 \ + -DCMAKE_EXPORT_COMPILE_COMMANDS=1 \ + -DCMAKE_CXX_FLAGS="-I/opt/rh/devtoolset-9/root/usr/lib/gcc/x86_64-redhat-linux/9/include -I/opt/logtail -I/opt/logtail_spl" \ + -DBUILD_LOGTAIL=ON -DBUILD_LOGTAIL_UT=ON -DWITHOUTGDB=ON -DENABLE_STATIC_LINK_CRT=ON -DWITHSPL=OFF ../core +make -sj$(nproc) +``` + +**Key CMake flags:** +| Flag | Purpose | +|------|---------| +| `BUILD_LOGTAIL` | Build LoongCollector binary. Required. | +| `BUILD_LOGTAIL_UT` | Build unit tests. Enable when modifying tests. | +| `WITHSPL` | SPL support. Set `OFF` unless working on SPL files. | + +#### C++ Unit Tests + +Each test directory under `core/unittest/*/` produces its own executable. + +**Build tests** (from inside `build/`): +```bash +make yaml_util_unittest app_config_unittest safe_queue_unittest -j$(nproc) +``` + +**Run tests** (from inside `build/`): +```bash +./unittest/common/yaml_util_unittest +./unittest/app_config/app_config_unittest +``` + +Tests must run from `build/` because some rely on relative paths for config files and temporary output. + +### Go Plugin Build + +```bash +make plugin_local +``` + +### Docker Build + +```bash +make image +``` + +### Cross-Compilation + +For ARM64: +```bash +make image ARCH=arm64 +``` + +### Common Issues + +- If CMake complains about missing dependencies, install them via `apt` or `yum` +- If linking fails, try `make clean` then rebuild +- For SPL-related builds, change `WITHSPL=OFF` to `WITHSPL=ON` in the cmake command diff --git a/.claude/skills/design-document/SKILL.md b/.claude/skills/design-document/SKILL.md new file mode 100644 index 0000000000..d62cebdfa9 --- /dev/null +++ b/.claude/skills/design-document/SKILL.md @@ -0,0 +1,98 @@ +--- +name: design-document +description: Design document writing conventions. Use when writing or reviewing technical design documents. +--- +# Design Document Conventions + +## 1. Background / Problem Statement + +### 1.1 Background and Pain Points +- Describe current system/module limitations and deficiencies +- List specific scenarios, metrics, or incident cases that triggered this design + +### 1.2 Impact Scope +- Affected modules, microservices, APIs, data stores, third-party dependencies +- Potential impact on performance, reliability, cost, maintainability +- Forward/backward compatibility analysis + +### 1.3 Constraints +- Compliance/security/performance/resource restrictions +- External system or infrastructure dependencies + +--- + +## 2. Design Goals + +### 2.1 Functional Goals +- List Must/Should/Could core capabilities by priority + +### 2.2 Non-Functional Goals +- Performance (throughput, latency, concurrency, resource usage) +- Scalability, maintainability, testability, observability +- Reliability (fault tolerance, HA, degradation, rollback strategies) + +### 2.3 Constraint Goals +- Backward compatibility, API stability +- Security and compliance requirements + +--- + +## 3. Technical Design + +### 3.1 Architecture Diagram +- Use Mermaid for high-level component diagrams with data/control flow + +### 3.2 Detailed Flowcharts +- Key business flows, exception flows, retry/compensation with timing and triggers + +### 3.3 Thread/Concurrency Model +- Thread lifecycle, inter-thread communication (locks, condition variables, queues, Actor patterns) +- Sequence diagrams for concurrency interactions + +### 3.4 Core Classes and Data Structures +- Class diagrams showing main classes, interfaces, inheritance/composition relationships +- Key data structure fields, lifecycle, thread-safety strategy + +### 3.5 Key Algorithms or Protocols +- Pseudocode or flow for pub/sub, load balancing, retry backoff, etc. +- State machine / protocol state transition diagrams + +### 3.6 Error Handling and Recovery +- Error classification, exception stack, retry strategies, degradation plans +- Monitoring metrics, alert trigger conditions and levels + +### 3.7 Deployment and Operations +- Configuration items, hot-update mechanisms, canary and rollback strategies +- CI/CD, container, Service Mesh, Kubernetes resource considerations + +--- + +## 4. Unit Testing + +### 4.1 Test Scope and Goals +- Cover core logic, boundary conditions, concurrency scenarios, exception paths + +### 4.2 Test Environment and Tools +- Google Test/Mock version, necessary third-party stubs/fakes + +### 4.3 Test Scenarios and Cases +| Case ID | Scenario | Input | Expected Output/Behavior | Mock Dependencies | +|---------|----------|-------|--------------------------|-------------------| +| TC-01 | Normal single log push | Single valid LogRecord | Returns SUCCESS, buffer size +1 | None | +| TC-02 | Buffer full | capacity=N filled | Throws BufferOverflowException | None | +| TC-03 | Concurrent push | Multi-thread simultaneous push | No data loss, order/final consistency matches design | MutexMock | +| TC-04 | flush clears | M items exist, then flush | Returns M items, buffer size=0 | TimeProviderMock | + +### 4.4 Boundary and Exception Testing +- Empty input, invalid input, extreme capacity, network/disk fault injection + +### 4.5 Performance Benchmarking (optional) +- Throughput, latency, CPU/Memory profile; comparison with baseline + +--- + +## Notes + +- **Do not** include project management info (estimates, schedules, milestones, Gantt charts) +- Code examples must follow team C++ coding standards (see `.claude/skills/project-knowledge/`) +- Test case naming: `__` for CI coverage tracking diff --git a/.claude/skills/e2e/SKILL.md b/.claude/skills/e2e/SKILL.md new file mode 100644 index 0000000000..92a8c4df6b --- /dev/null +++ b/.claude/skills/e2e/SKILL.md @@ -0,0 +1,209 @@ +--- +name: e2e +description: LoongCollector E2E 测试全流程指南:设计、编写、运行和调试。当需要编写新 E2E 测试、运行现有测试、或排查 E2E 测试失败时使用此 skill。 +--- +# LoongCollector E2E 测试指南 + +> 详细步骤模板见 [reference.md](reference.md) | 可复用脚本见 [scripts/](scripts/) + +## 目录 + +1. [概览](#1-概览) +2. [设计测试用例](#2-设计测试用例) +3. [编写测试用例](#3-编写测试用例) +4. [本地运行(docker-compose)](#4-本地运行) +5. [调试](#5-调试) +6. [已知陷阱](#6-已知陷阱) + +--- + +## 1 概览 + +基于 **BDD Godog** 框架,通过 `.feature` 文件描述场景,引擎正则匹配步骤函数并传参。 + +``` +test/e2e/ + test_cases// + case.feature # 场景描述 + docker-compose.yaml # 可选,外部依赖服务 + engine/ + steps.go # 所有可用步骤(权威来源) + setup/ control/ trigger/ verify/ cleanup/ +``` + +**环境 tag**:`@host`、`@k8s`、`@docker-compose`(三选一,加 `@e2e`) + +--- + +## 2 设计测试用例 + +编写 feature 文件前,先确定测试矩阵。按以下维度逐项评估是否需要覆盖: + +### 2.1 场景维度清单 + +| 维度 | 典型场景 | 何时需要 | +|------|----------|----------| +| **基础功能** | 单配置、单数据类型端到端 | 必须 | +| **多数据类型** | logs / metrics / traces 分别验证 | 插件支持多类型时 | +| **多配置共存** | 同时加载多个 pipeline 配置 | 涉及端口/资源竞争时 | +| **配置热加载** | 运行中增/删/改配置 | 持续运行的 input 插件 | +| **配置类型变更** | 从 A 类型切换到 B 类型 | 插件支持多协议/格式时 | +| **反压与恢复** | 下游不可达 → 恢复后数据不丢 | flusher 插件 | +| **外部依赖失效** | 依赖服务重启/不可达 | 有外部依赖时 | +| **大数据量** | 高吞吐压力下不 OOM/不丢数据 | 性能敏感路径 | + +### 2.2 设计产出 + +确定要覆盖的场景后,明确每个 Scenario 的: +- **输入**:什么数据、什么格式、多少条 +- **流经路径**:input → processor → flusher 的具体插件 +- **预期输出**:在哪里验证、验证什么 +- **外部依赖**:需要什么辅助服务(OTel Collector、Kafka 等) + +--- + +## 3 编写测试用例 + +### 3.1 目录结构 + +``` +test/e2e/test_cases/my_feature/ +├── case.feature +├── docker-compose.yaml # 外部依赖 +└── otel-collector-config.yaml # 如果用 OTel Collector +``` + +### 3.2 Feature 文件模板 + +```gherkin +@flusher +Feature: my feature name + Brief description + + @e2e @docker-compose + Scenario: TestMyFeatureLogs + Given {docker-compose} environment + Given {my-config} local config as below + """ + enable: true + inputs: + - Type: input_forward + Protocol: OTLP + Endpoint: "0.0.0.0:4320" + flushers: + - Type: flusher_otlp_native + Endpoint: "otel-collector:4317" + """ + When start docker-compose {my_feature} + Then wait {10} seconds + When generate {1} OTLP {logs} via otelgen to endpoint {loongcollectorC:4320}, protocol {grpc} + Then wait {5} seconds + Then otlp collector received at least {1} logs from file {/tmp/otel-export/logs.json} +``` + +### 3.3 强制规则 + +- 配置中必须含 `enable: true` +- **只使用** `test/engine/steps.go` 中已注册的步骤 +- `wait {N} seconds` 是 **Then** 类型,不是 When +- 命名格式:`Test${功能名}${场景描述}` +- **不要**在持续运行插件的配置中使用 `global.ExcutionTimeout`(见 §6.1) + +### 3.4 扩展步骤 + +如需新步骤,参考 [reference.md §扩展步骤](reference.md) 中的开发和注册流程。 + +--- + +## 4 本地运行 + +### 4.1 前置条件 + +```bash +docker --version && docker compose version +``` + +如修改了 C++ 代码,需重新编译并更新镜像。两种方式: + +**方式一:完整构建**(慢,但保证一致) +```bash +make e2e_image # 从源码构建完整 Docker 镜像 aliyun/loongcollector:0.0.1 +``` + +**方式二:增量更新**(快,适合迭代调试) +```bash +cd build && make -sj$(nproc) && cd .. +# 替换镜像中的二进制 +docker create --name tmp-lc aliyun/loongcollector:0.0.1 +docker cp build/loongcollector tmp-lc:/usr/local/loongcollector/loongcollector +docker commit tmp-lc aliyun/loongcollector:0.0.1 +docker rm tmp-lc +``` + +### 4.2 运行 + +```bash +cd test/e2e + +# 运行整个测试用例(所有 Scenario) +TEST_CASE=flusher_otlp_native go test -v -run "TestE2EOnDockerCompose$" \ + -timeout 600s -count=1 ./... + +# 只运行指定 Scenario +TEST_CASE=flusher_otlp_native go test -v \ + -run "TestE2EOnDockerCompose/TestFlusherOTLPNativeLogs$" \ + -timeout 600s -count=1 ./... +``` + +### 4.3 清理(测试失败后必做) + +可以直接运行脚本 `bash .cursor/skills/e2e/scripts/e2e-cleanup.sh`,或手动执行: + +```bash +docker rm -f $(docker ps -aq) 2>/dev/null +docker network prune -f +rm -rf test/e2e/config test/e2e/onetime_pipeline_config +sudo rm -rf test/e2e/report +rm -f test/e2e/test_cases//testcase-compose.yaml +``` + +--- + +## 5 调试 + +```bash +# 1. 查看容器日志 +docker ps | grep loongcollectorC +docker exec cat /usr/local/loongcollector/log/loongcollector.LOG + +# 2. 检查配置是否加载 +docker exec ls /usr/local/loongcollector/conf/continuous_pipeline_config/local/ + +# 3. 检查端口是否监听 +docker exec ss -tlnp | grep + +# 4. 手动复现 compose 环境 +cd test/e2e/test_cases/ +docker compose -f testcase-compose.yaml up -d +docker compose -f testcase-compose.yaml logs -f loongcollectorC +``` + +--- + +## 6 已知陷阱 + +### 6.1 ExcutionTimeout 使配置变为一次性 + +**绝对不要**在 `input_forward`、`input_file` 等持续插件的配置中使用 `global.ExcutionTimeout`。 + +它会使 `IsOnetime()` 返回 true,导致 `IsValidNativeInputPlugin(name, true)` 在 onetime 注册表中查找,而大部分 input 只注册了 continuous,结果报 `unsupported input plugin`。 + +详见 `.cursor/rules/project-knowledge/config-pitfalls.mdc`。 + +### 6.2 FlusherFile 必须是文件 + +e2e 模板将 `report/default_flusher.json` bind-mount 到容器。若宿主机路径不存在,Docker 会创建为**目录**。已在 `BootController.Start()` 中自动处理。 + +### 6.3 测试间残留 + +多 Scenario 共享进程,`Clean()` 会删除 config/report。异常退出后手动清理(§4.3)。 diff --git a/.claude/skills/e2e/reference.md b/.claude/skills/e2e/reference.md new file mode 100644 index 0000000000..d1c662ae35 --- /dev/null +++ b/.claude/skills/e2e/reference.md @@ -0,0 +1,134 @@ +# E2E 测试详细参考 + +## 可用步骤速查 + +> 权威来源:`test/engine/steps.go` + +### Given(环境准备) + +| 步骤模板 | 说明 | +|----------|------| +| `{docker-compose} environment` | 初始化 docker-compose 环境 | +| `{host} environment` | 初始化主机环境 | +| `{daemonset} environment` | 初始化 K8s 环境 | +| `{name} local config as below` | 写入持续采集配置 | +| `{name} onetime pipeline local config as below` | 写入一次性采集配置 | +| `subcribe data from {sls} with config` | 订阅 SLS 数据源 | +| `loongcollector depends on containers {name}` | 设置容器依赖 | +| `loongcollector container mount {src} to {dst}` | 挂载卷 | +| `loongcollector expose port {host} to {container}` | 暴露端口 | +| `docker-compose boot type {type}` | 设置 boot 类型 | +| `mkdir {path}` | 创建目录 | + +### When(触发动作) + +| 步骤模板 | 说明 | +|----------|------| +| `start docker-compose {case_name}` | 启动 docker-compose 环境 | +| `begin trigger` | 标记触发开始时间(生成日志前必须调用) | +| `generate {N} regex logs to file {path}, with interval {M}ms` | 生成正则日志 | +| `generate {N} json logs to file {path}, with interval {M}ms` | 生成 JSON 日志 | +| `generate {N} apsara logs to file {path}, with interval {M}ms` | 生成 Apsara 日志 | +| `generate {N} OTLP {logs\|metrics\|traces} via otelgen to endpoint {ep}, protocol {grpc\|http}` | 生成 OTLP 数据 | +| `generate {N} http logs, with interval {M}ms, url: {url}, method: {method}, body:` | 生成 HTTP 日志 | +| `execute {N} commands {cmd} in sequence` | 顺序执行命令 | +| `execute {N} commands {cmd} in parallel` | 并行执行命令 | +| `create the shell script file {name} with the following content` | 创建 shell 脚本 | +| `execute {N} the shell script file {name} in parallel` | 并行执行 shell 脚本 | +| `restart agent` | 重启 Agent | +| `force restart agent` | 强制重启 Agent | + +### Then(结果验证) + +| 步骤模板 | 说明 | +|----------|------| +| `there is {N} logs` | 精确验证日志数(上限 100) | +| `there is at least {N} logs` | 最少日志数验证 | +| `there is less than {N} logs` | 最多日志数验证 | +| `the log fields match kv` | KV 字段匹配(文档内容跟 `"""..."""`) | +| `the log fields match as below` | 日志字段模式匹配 | +| `the log tags match kv` | Tag KV 匹配 | +| `the log is in order` | 日志顺序验证 | +| `wait {N} seconds` | 等待 N 秒 | +| `otlp collector received at least {N} (logs\|metrics\|traces) from file {path}` | OTel Collector 数据验证 | + +> 注意:日志数量验证上限 100。超过 100 用 `When query through` + `Then the log fields match kv` 方式。 + +--- + +## 扩展步骤 + +### 1. 编写函数 + +在 `test/engine/` 对应子目录下: + +```go +func MyVerification(ctx context.Context, expected int) (context.Context, error) { + // 实现逻辑 + return ctx, nil +} +``` + +签名要求:第一个参数 `context.Context`,返回 `(context.Context, error)`。 + +### 2. 注册 + +在 `test/engine/steps.go` 中: + +```go +ctx.Then(`^my verification expects \{(\d+)\}$`, verify.MyVerification) +``` + +### 3. 使用 + +```gherkin +Then my verification expects {42} +``` + +--- + +## docker-compose.yaml 示例 + +### OTel Collector(OTLP 测试用) + +```yaml +services: + otel-collector: + image: otel/opentelemetry-collector-contrib:latest + hostname: otel-collector + user: "0:0" + ports: + - "4317" + volumes: + - ./otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml + - ./otel-export:/tmp/otel-export + healthcheck: + test: ["CMD", "wget", "--spider", "-q", "http://localhost:13133/"] + interval: 5s + timeout: 3s + retries: 5 + start_period: 10s +``` + +--- + +## eBPF 进程安全测试示例 + +```gherkin +@e2e @host @ebpf_input +Scenario: TestEBPFProcessSecurityByNormalStart + Given {host} environment + Given subcribe data from {sls} with config + """ + """ + Given {ebpf_process_security_default} local config as below + """ + enable: true + inputs: + - Type: input_process_security + """ + When begin trigger + When execute {1} commands {/bin/echo 1} in sequence + When query through {* | select * from e2e where call_name = 'execve' and binary = '/bin/echo' and arguments = '1'} + Then there is {1} logs +``` diff --git a/.claude/skills/e2e/scripts/e2e-cleanup.sh b/.claude/skills/e2e/scripts/e2e-cleanup.sh new file mode 100755 index 0000000000..ceba2870be --- /dev/null +++ b/.claude/skills/e2e/scripts/e2e-cleanup.sh @@ -0,0 +1,32 @@ +#!/usr/bin/env bash +# E2E 测试环境清理脚本 +# 用法: bash .claude/skills/e2e/scripts/e2e-cleanup.sh [case_name] +set -euo pipefail + +REPO_ROOT="$(git rev-parse --show-toplevel)" +E2E_DIR="$REPO_ROOT/test/e2e" +CASE_NAME="${1:-}" + +echo "==> 停止并删除所有 Docker 容器..." +docker rm -f $(docker ps -aq) 2>/dev/null || true + +echo "==> 清理 Docker 网络..." +docker network prune -f 2>/dev/null || true + +echo "==> 清理运行时目录..." +rm -rf "$E2E_DIR/config" "$E2E_DIR/onetime_pipeline_config" +sudo rm -rf "$E2E_DIR/report" 2>/dev/null || rm -rf "$E2E_DIR/report" 2>/dev/null || true + +if [[ -n "$CASE_NAME" ]]; then + CASE_DIR="$E2E_DIR/test_cases/$CASE_NAME" + if [[ -d "$CASE_DIR" ]]; then + echo "==> 清理测试用例 $CASE_NAME..." + rm -f "$CASE_DIR/testcase-compose.yaml" + rm -f "$CASE_DIR/otel-export/"*.json 2>/dev/null || true + fi +else + echo "==> 清理所有测试用例的 testcase-compose.yaml..." + find "$E2E_DIR/test_cases" -name "testcase-compose.yaml" -delete 2>/dev/null || true +fi + +echo "==> 清理完成" diff --git a/.claude/skills/mermaid/SKILL.md b/.claude/skills/mermaid/SKILL.md new file mode 100644 index 0000000000..3cccf610d8 --- /dev/null +++ b/.claude/skills/mermaid/SKILL.md @@ -0,0 +1,42 @@ +--- +name: mermaid +description: Mermaid diagram conventions. Use whenever diagrams are needed in documentation or code review. +--- +# Mermaid Diagram Conventions + +## Rules for Creating Mermaid Diagrams + +1. **Use Correct Fenced Code Block**: Always use ````mermaid ... ```` + +2. **Stick to Well-Supported Diagram Types**: + - `graph` (flowcharts, `TD` preferred for readability) + - `sequenceDiagram` + - `classDiagram` + - `stateDiagram-v2` (prefer v2) + - `erDiagram` + - `pie`, `gantt`, `mindmap` (basic only) + - Avoid very new or uncommon types + +3. **Simple Standard Syntax**: + - **Node IDs**: Use simple alphanumeric IDs (`node1`, `processA`). No spaces or special chars. + - **Labels**: **Use quotes** for labels with spaces/punctuation/keywords. + - Good: `A["User Input"] --> B["Validate Data"];` + - Bad: `A[User Input] --> B[Validate Data];` + - Use standard arrows (`-->`, `---`, `==>`) + - Comments: `%%` + +4. **Mindmap (GitHub compatible)**: + - Use basic indentation structure only + - NO `::icon()` syntax (causes rendering errors) + - Each node on its own line with correct indentation + +5. **Prefer Vertical Layouts**: `graph TD` or `graph TB` for flowcharts (easier to read in Markdown) + +6. **Let GitHub Handle Styling**: + - DO NOT set themes (`%%{init: ...}`) + - DO NOT use `classDef` or `style` + - GitHub auto-adapts to light/dark mode + +7. **Keep Diagrams Focused**: Break complex diagrams into multiple simpler ones + +8. **Always Review Automated Edits**: Tools may break Mermaid syntax, especially with indentation-heavy formats like mindmap diff --git a/.claude/skills/omc-reference/SKILL.md b/.claude/skills/omc-reference/SKILL.md new file mode 100644 index 0000000000..cc02915c07 --- /dev/null +++ b/.claude/skills/omc-reference/SKILL.md @@ -0,0 +1,141 @@ +--- +name: omc-reference +description: OMC agent catalog, available tools, team pipeline routing, commit protocol, and skills registry. Auto-loads when delegating to agents, using OMC tools, orchestrating teams, making commits, or invoking skills. +user-invocable: false +--- + +# OMC Reference + +Use this built-in reference when you need detailed OMC catalog information that does not need to live in every `CLAUDE.md` session. + +## Agent Catalog + +Prefix: `oh-my-claudecode:`. See `agents/*.md` for full prompts. + +- `explore` (haiku) — fast codebase search and mapping +- `analyst` (opus) — requirements clarity and hidden constraints +- `planner` (opus) — sequencing and execution plans +- `architect` (opus) — system design, boundaries, and long-horizon tradeoffs +- `debugger` (sonnet) — root-cause analysis and failure diagnosis +- `executor` (sonnet) — implementation and refactoring +- `verifier` (sonnet) — completion evidence and validation +- `tracer` (sonnet) — trace gathering and evidence capture +- `security-reviewer` (sonnet) — trust boundaries and vulnerabilities +- `code-reviewer` (opus) — comprehensive code review +- `test-engineer` (sonnet) — testing strategy and regression coverage +- `designer` (sonnet) — UX and interaction design +- `writer` (haiku) — documentation and concise content work +- `qa-tester` (sonnet) — runtime/manual validation +- `scientist` (sonnet) — data analysis and statistical reasoning +- `document-specialist` (sonnet) — SDK/API/framework documentation lookup +- `git-master` (sonnet) — commit strategy and history hygiene +- `code-simplifier` (opus) — behavior-preserving simplification +- `critic` (opus) — plan/design challenge and review + +## Model Routing + +- `haiku` — quick lookups, lightweight inspection, narrow docs work +- `sonnet` — standard implementation, debugging, and review +- `opus` — architecture, deep analysis, consensus planning, and high-risk review + +## Tools Reference + +### External AI / orchestration +- `/team N:executor "task"` +- `omc team N:codex|gemini "..."` +- `omc ask ` +- `/ccg` + +### OMC state +- `state_read`, `state_write`, `state_clear`, `state_list_active`, `state_get_status` + +### Team runtime +- `TeamCreate`, `TeamDelete`, `SendMessage`, `TaskCreate`, `TaskList`, `TaskGet`, `TaskUpdate` + +### Notepad +- `notepad_read`, `notepad_write_priority`, `notepad_write_working`, `notepad_write_manual` + +### Project memory +- `project_memory_read`, `project_memory_write`, `project_memory_add_note`, `project_memory_add_directive` + +### Code intelligence +- LSP: `lsp_hover`, `lsp_goto_definition`, `lsp_find_references`, `lsp_diagnostics`, and related helpers +- AST: `ast_grep_search`, `ast_grep_replace` +- Utility: `python_repl` + +## Skills Registry + +Invoke built-in workflows via `/oh-my-claudecode:`. + +### Workflow skills +- `autopilot` — full autonomous execution from idea to working code +- `ralph` — persistence loop until completion with verification +- `ultrawork` — high-throughput parallel execution +- `visual-verdict` — structured visual QA verdicts +- `team` — coordinated team orchestration +- `ccg` — Codex + Gemini + Claude synthesis lane +- `ultraqa` — QA cycle: test, verify, fix, repeat +- `omc-plan` — planning workflow and `/plan`-safe alias +- `ralplan` — consensus planning workflow +- `sciomc` — science/research workflow +- `external-context` — external docs/research workflow +- `deepinit` — hierarchical AGENTS.md generation +- `deep-interview` — Socratic ambiguity-gated requirements workflow +- `ai-slop-cleaner` — regression-safe cleanup workflow + +### Utility skills +- `ask`, `cancel`, `note`, `learner`, `omc-setup`, `mcp-setup`, `hud`, `omc-doctor`, `trace`, `release`, `project-session-manager`, `skill`, `writer-memory`, `configure-notifications` + +### Keyword triggers kept compact in CLAUDE.md +- `"autopilot"→autopilot` +- `"ralph"→ralph` +- `"ulw"→ultrawork` +- `"ccg"→ccg` +- `"ralplan"→ralplan` +- `"deep interview"→deep-interview` +- `"deslop" / "anti-slop"→ai-slop-cleaner` +- `"deep-analyze"→analysis mode` +- `"tdd"→TDD mode` +- `"deepsearch"→codebase search` +- `"ultrathink"→deep reasoning` +- `"cancelomc"→cancel` +- Team orchestration is explicit via `/team`. + +## Team Pipeline + +Stages: `team-plan` → `team-prd` → `team-exec` → `team-verify` → `team-fix` (loop). + +- Use `team-fix` for bounded remediation loops. +- `team ralph` links the team pipeline with Ralph-style sequential verification. +- Prefer team mode when independent parallel lanes justify the coordination overhead. + +## Commit Protocol + +Use git trailers to preserve decision context in every commit message. + +### Format +- Intent line first: why the change was made +- Optional body with context and rationale +- Structured trailers when applicable + +### Common trailers +- `Constraint:` active constraint shaping the decision +- `Rejected:` alternative considered | reason for rejection +- `Directive:` forward-looking warning or instruction +- `Confidence:` `high` | `medium` | `low` +- `Scope-risk:` `narrow` | `moderate` | `broad` +- `Not-tested:` known verification gap + +### Example +```text +feat(docs): reduce always-loaded OMC instruction footprint + +Move reference-only orchestration content into a native Claude skill so +session-start guidance stays small while detailed OMC reference remains available. + +Constraint: Preserve CLAUDE.md marker-based installation flow +Rejected: Sync all built-in skills in legacy install | broader behavior change than issue requires +Confidence: high +Scope-risk: narrow +Not-tested: End-to-end plugin marketplace install in a fresh Claude profile +``` diff --git a/.claude/skills/project-knowledge/SKILL.md b/.claude/skills/project-knowledge/SKILL.md new file mode 100644 index 0000000000..44127422e6 --- /dev/null +++ b/.claude/skills/project-knowledge/SKILL.md @@ -0,0 +1,220 @@ +--- +name: project-knowledge +description: LoongCollector project knowledge: architecture, terminology, codebase map, and coding standards (C++/Go). +--- +# LoongCollector Project Knowledge + +## Architecture Overview + +The LoongCollector architecture is based on a plugin system with the following key components: + +1. **Core Application**: Main entry point in `core/logtail.cpp`, initializes `Application` class in `core/application/Application.cpp`. Follows singleton pattern, manages overall lifecycle. + +2. **Plugin System**: Supports plugins for data collection, processing, and flushing: + - **Inputs**: Collect data from various sources (files, network, system metrics, etc.) + - **Processors**: Transform and process collected data + - **Flushers**: Send processed data to various backends + +3. **Pipeline Management**: Collection pipelines managed by `CollectionPipelineManager` handle data flow from inputs through processors to flushers. + +4. **Configuration**: Supports both local and remote configuration management with watchers that monitor for configuration changes. + +5. **Queuing System**: Implements various queue types including bounded queues, circular queues, and exactly-once delivery queues for reliable data transmission. + +6. **Monitoring**: Built-in monitoring and metrics collection for tracking the collector's own performance and health. + +## Project Structure + +``` +core/ # Core C++ code + plugin/ # Plugin system + input/ # Data collection input plugins + processor/ # Data processing plugins + flusher/ # Data output plugins (SLS, file, etc.) + collection_pipeline/ # Main pipeline flow (queue, batch, serialization) + config/ # Configuration (loading, providers, feedback) + provider/ # Config providers (Enterprise, Legacy) + common/ # Common utilities, data structures, network, string, crypto + monitor/ # Monitoring, metrics collection, alerting + logger/ # Logging system + checkpoint/ # Checkpoint, state management + app_config/ # Global configuration + models/ # Core data structures (events, logs, metrics) + parser/ # Log parsers + task_pipeline/ # Task scheduling + go_pipeline/ # Go plugin integration + ebpf/ # eBPF collection and plugins + host_monitor/ # Host-level monitoring + shennong/ # Shennong metrics + prometheus/ # Prometheus collection + file_server/ # File collection and management + container_manager/ # Container environment management + application/ # Main application entry + protobuf/ # Protobuf protocol definitions + metadata/ # K8s and other metadata collection + constants/ # Constants + tools/ # Internal utility scripts + unittest/ # Unit tests + legacy_test/ # Historical test cases + +pkg/ # Go packages + helper/ # Go helper functions + containercenter/ # Go container-related functions + +plugin_main/ # Plugin main entry +pluginmanager/ # Go plugin manager (lifecycle, registration) +plugins/ # Go plugin packages + input/ # Go input plugins (docker, etc.) + processor/ # Go processor plugins + flusher/ # Go flusher plugins + aggregator/ # Go aggregator plugins + extension/ # Go extension plugins + all/ # Plugin registration and init + test/ # Go plugin tests + +test/ # Integration tests +e2e/ # E2E test cases (open source Go plugins) +e2e_enterprise/ # E2E enterprise test cases (host + K8s) +docs/ # Project documentation +scripts/ # Build, deploy, test scripts +docker/ # Docker-related files +rpm/ # RPM packaging +external/ # External dependencies +``` + +## Key Dependencies + +### Header-Only Libraries +- `spdlog` - Logging +- `rapidjson` - JSON parsing + +### Compiled Libraries +- **Testing**: `gtest`, `gmock` +- **Serialization**: `protobuf` +- **Regex**: `re2` +- **Hash**: `cityhash` +- **Config**: `jsoncpp`, `yamlcpp` +- **Compression**: `lz4`, `zlib`, `zstd` +- **Network**: `curl`, `ssl`, `crypto` +- **System**: `boost`, `gflags`, `leveldb`, `uuid` +- **Memory**: `tcmalloc` (optional) + +### Tech Stack +- C++ (main implementation, C++17/20) +- Protobuf (data serialization) +- eBPF (kernel-level data collection) +- Prometheus (metrics collection) +- Go (plugin adaptation) +- Shell/Python (build and test scripts) + +## Terminology Glossary + +| Term | Description | +|------|-------------| +| LoongCollector | The observability data collection agent (formerly iLogtail) | +| Pipeline | A data processing chain: Input -> Processor(s) -> Flusher | +| Plugin | A modular component that performs specific data operations | +| Input Plugin | Collects data from a source (file, network, metric, etc.) | +| Processor Plugin | Transforms data (parse, filter, enrich, etc.) | +| Flusher Plugin | Sends data to a destination (SLS, stdout, Prometheus, etc.) | +| Config | Collection configuration defining pipeline behavior | +| Checkpoint | Persistent state tracking for exactly-once delivery | +| Runner | Execution wrapper for a specific plugin instance | +| Queue | Data buffer between pipeline stages | +| Batch | Group of events processed/sent together | +| SLS | Alibaba Cloud Simple Log Service | +| eBPF | Extended Berkeley Packet Filter (kernel tracing) | +| SPL | Structured Processing Language | + +## Codebase Map + +### Key Entry Points and Core Flows + +| Path | Purpose | +|------|---------| +| `core/logtail.cpp` | Main entry point | +| `core/application/Application.cpp` | Application singleton, lifecycle management | +| `core/collection_pipeline/CollectionPipelineManager.cpp` | Pipeline lifecycle | +| `core/collection_pipeline/CollectionPipeline.cpp` | Pipeline execution | +| `core/runner/ProcessorRunner.cpp` | Processor execution | +| `core/runner/FlusherRunner.cpp` | Flusher execution | +| `core/config/watcher/PipelineConfigWatcher.cpp` | Config change detection | +| `core/file_server/FileServer.cpp` | File collection management | +| `core/file_server/checkpoint/CheckpointManagerV2.cpp` | Exactly-once checkpoint | + +### Invariant Rules + +- **Lifecycle**: All plugins follow Init -> Start -> Stop -> Close lifecycle +- **Resource Release**: Every thread/future/queue must be properly cleaned up on stop +- **Config**: Environment variables are case-insensitive with default fallbacks +- **Queue**: Bounded queue with backpressure; pop on disabled queue should not hang +- **Hot Reload**: After config change, system must return to consistent "collect+process+send" state + +### Common Patterns + +- RAII for resource management +- Smart pointers over raw pointers +- Singleton pattern for managers (Application, AlarmManager, WriteMetrics) +- Thread-safe queues with condition variables +- Plugin registration via static initialization + +## C++ Coding Standards + +### Naming +- **PascalCase** for class names, global functions, public methods +- **camelCase** for variable names and private methods +- **SCREAMING_SNAKE_CASE** for macros and constants +- **m** prefix for member variables (e.g., `mUserId`) +- **k** prefix for constants (e.g., `kMaxSendBufferSize`) + +### Modern C++ +- Prefer C++17/20 features (auto, range-based loops, smart pointers) +- Use `std::unique_ptr` / `std::shared_ptr` for memory management +- Prefer `std::optional`, `std::variant`, `std::any` for type-safe alternatives +- Use `constexpr` and `const` for compile-time computations +- Use `std::string_view` for read-only string operations + +### Error Handling +- Use exceptions for error handling (`std::runtime_error`, `std::invalid_argument`) +- RAII for resource management to avoid memory leaks +- Validate inputs at function boundaries +- Log errors using spdlog + +### Performance +- Avoid unnecessary heap allocations; prefer stack-based objects +- Use `std::move` for move semantics +- Optimize loops with `` (e.g., `std::sort`, `std::for_each`) +- Use `std::array` or `std::vector` over raw arrays + +### Security +- Avoid C-style casts; use `static_cast`, `dynamic_cast`, `reinterpret_cast` +- Enforce const-correctness +- Avoid global variables; use singletons sparingly +- Use `enum class` for strongly typed enumerations + +### Testing +- Unit tests using Google Test (GTest) / Google Mock +- Integration tests for system components + +## Go Coding Standards + +### Naming +- **PascalCase** for exported types and functions +- **camelCase** for unexported types and functions +- **snake_case** for variables and constants +- Package names use lowercase + +### Error Handling +- Return errors explicitly, do not panic +- Use `fmt.Errorf` with `%w` for error wrapping +- Check errors at every call site + +### Concurrency +- Use goroutines for concurrent operations +- Use channels or sync primitives for communication +- Avoid goroutine leaks; always provide exit paths + +### Testing +- Use standard `testing` package +- Table-driven tests for function coverage +- Integration tests via E2E framework diff --git a/.claude/skills/review-standards/SKILL.md b/.claude/skills/review-standards/SKILL.md new file mode 100644 index 0000000000..3fafef5dd8 --- /dev/null +++ b/.claude/skills/review-standards/SKILL.md @@ -0,0 +1,255 @@ +--- +name: review-standards +description: Code review behavioral standards. Reference during code review to ensure consistent quality checks from a QA perspective. +--- +# Code Review Rule + +你是一个高级代码审查助手,审查代码时要从QA角度仔细检查问题,以批判的眼光看待代码,以发现潜在问题为目的。 + +为了避免得到假阳性的检查结果,请注意: + +* 分析具体代码片段时要包含足够上下文,不要仅基于局部信息做出判断。 + +* 避免基于记忆进行代码分析,必须基于实际查看的代码。 + +* 在指出问题前,先理解业务逻辑的完整流程,考虑代码设计的合理性和必要性。 + + +请按下面的步骤进行Code Review + +## 1. 获取评审内容,无需输出 + +用户会提供分支或PR信息,请根据以下指示获取评审文件列表和内容 + +1. 如果提供两个分支名称(例如 "fork/feature" 和 "main")。获取评审文件列表和内容的方法是: + + * 运行 `git branch` 和 `git remote` 了解分支是origin分支还是其他远程分支。 + + * 需要使用 `git fetch` 检出分支(如 `fork/feature`、`origin/main`),确保获取最新代码。 + + * 运行 `git checkout fork/feature && git pull` 将内容拉取到本地,以便review时查询完整上下文。 + + * 执行 `git diff --name-only --diff-filter=M origin/main...fork/feature` 来列出被修改的文件。 + + * 对于上述列表中的每个文件,运行 `git diff --quiet origin/main...fork/feature -- `获取变更内容。 + +2. 如果仅提供一个分支名称(例如 "fork/feature"),那么另一分支名称就是"main",然后和提供两个分支名称一样处理。 + +3. 如果提供的是一个PR号,那么两个分支分别为 "origin/pull/{PR号}/head" 和 "main",然后和提供两个分支名称一样处理。 + + +## 2. 高层次摘要,需要输出 + +对评审内容用 2–3 句话概括描述: + +* **产品影响**:这项变更对用户或客户带来了什么价值? + +* **工程实现方式**:使用了哪些关键数据结构、算法、模式、框架或最佳实践? + +## 3. PR代码理解 + +请以代码作者视角向Reviewer解释当前PR想干什么。必要时使用mermaid画出关键逻辑、数据结构和交互时序图。 +首先,从全局视角梳理这个PR涉及到数据采集、处理、发送的整体流程(不涉及的部分无需说明),关键组件数据流怎么串联的,用的什么数据结构。 +然后,说明这个PR想扩展什么,应该怎么扩展。 +最后,详解PR实际怎么做的,包括解析、错误处理、重试等关键逻辑。 + +## 4. 牢记评估标准,无需输出 + +针对每个有变更的文件及其差异块,评估这些行是否符合以下方面的要求: + +1. **业务逻辑深度理解** + + * 分析组件的实际作用和预期行为 + + * 识别可能导致功能失效的边缘情况 + + * 质疑现有的设计是否满足业务目标 + + * 考虑故障模式和容错机制 + +2. **设计与架构** + + * 模块职责:确保单一职责原则,检查设计是否符合 SOLID 原则,将可测试性作为重要标准 + + * 依赖管理:识别组件间的调用链和依赖关系,检查是否存在循环依赖或隐含依赖 + + * 分析故障传播路径,确保故障的上下文信息正确 + + * Input和Flusher采用总线Runner模式,配置通过注册应用,线程数不随配置数量增加 + + * 自监控涉及重启的功能应该由LogtailMonitor统一管理 + +3. **正确性与安全** + + * 边界检查:数组/容器访问前验证索引,如`if (index < container.size())` + + * 空指针防护:公共方法必须检查指针参数,如`if (!ptr) return false;` + + * 类型安全:JSON解析先验证类型,如`if (json.isString()) value = json.asString();` + + * 资源管理:使用RAII和智能指针,避免内存泄漏,如`std::unique_ptr`、`std::shared_ptr`。优先使用现成的RAII封装,如需自定义清理逻辑可使用unique\_ptr + lambda构建。 + + * 错误处理:外部输入防御式编程,包括读配置(如`std::ios_base::failure`、`std::filesystem::filesystem_error`、`boost::regex_error`)、文件、数据库、网络,必须有异常处理和完备日志 + + * 错误传播:检查错误是否正确传播到上层,避免静默失败 + + * 外部接口调用容错:对外部API调用、网络请求等失败场景,必须实现指数退避重试机制,避免因瞬时故障导致外部接口过载。 + + * 类型转换:检查类型转换的安全性,特别是缩窄转换(narrowing conversion) + +4. **性能与效率** + + * 内存优化: + + * 容器预分配大小,如`vector.reserve(expected_size)` + + * 避免不必要拷贝,优先移动语义和引用传递,如`map.emplace(args)`,`auto& val = map[key]`。 + + * 字符串操作优先使用 `StringView`数据结构避免复制,优先使用core/common/StringTools.h已有的工具函数如,字符串切分`StringViewSplitter`,字符串修剪`Trim`,字符串解析`StringTo`。 + + * 限制容器最大大小防止内存爆炸,如`if (queue.size() > MAX_QUEUE_SIZE)` + + * 计算效率: + + * 缓存重复计算结果,避免热点路径中的重复工作,例如通过sysconf获取的值仅需在初始化时获取一次。 + + * 确保已使用业界最优的数据结构和算法,尽量避免非线性性能退化 + + * 批处理操作减少系统、网络调用开销,如批量发送 + + * 热路径性能审查: + + * 特别关注循环内部、事件处理循环中的性能变化 + + * 对比新旧实现的时间复杂度差异 + + * 质疑任何在高频路径中引入额外数据结构查找的变更 + + * 主机监控指标:添加指标应该在SystemInterface中同时添加缓存,确保同一时间点获取的指标一致。 + +5. **并发与线程安全** + + * 锁策略:最小化锁范围,优先无锁数据结构如`boost::concurrent_flat_map` + + * 死锁预防:多锁时统一加锁顺序,避免嵌套锁 + + * 线程复用:使用线程池而非频繁创建线程 + + * 事件驱动:IO操作优先考虑事件驱动而非多线程 + + * 数据竞争:共享数据必须同步保护,原子操作优于锁 + + * 异步数据高效传递,例如优先使用epoll的`event.data.ptr`,curl的`CURLOPT_PRIVATE`直接携带上下文数据。 + + * 新增线程:应使用`std::future`、`std::mutex`、`std::condition_variable`配套模式,以便快速停止,参考core/common/timer/Timer.h。 + +6. **动态链接库** + + * 使用core/common/DynamicLibHelper.cpp中定义的工具加载动态链接库,避免直接依赖导致的兼容性问题。 + + * 动态链接库中的代码中不允许自己分配线程资源,必须由主程序控制。 + + * 动态链接库中的内存申请和释放方法必须配对,不允许跨主程序和动态链接库进行内存申请和释放。 + +7. **可读性与规范** + + * 标准: + + * 复用C++17标准库,避免重复轮子 + + * 尽可能使用`constexpr`、`auto`、范围for循环(`for (auto& elem : container) {}`) + + * 使用`std::optional`安全地表示可能为空的返回值,使用`std::variant`处理几种固定不同类型的值。 + + * 调用linter工具,发现违反规范的新增代码 + + * 命名约定: + + * 类名PascalCase:`InputContainerStdio` + + * 成员变量m前缀:`mProject`, `mLogstore` + + * 常量变量k前缀:`kMaxSendLogGroupSize` + + * 代码组织: + + * 保持控制流简洁,降低圈复杂度,抽象重复逻辑(DRY原则),将密集逻辑重构为可测试的辅助方法 + + * 彻底移除无用或不可达代码,包括注释掉的废弃代码。 + + * 魔法数字抽成常量或gflag。 + + * 优先使用结构体数组,而不是平行的多个数组。 + + * 变量和方法应该声明在header文件中,实现在cpp文件,除非是模版类或者有强烈的inline需要。 + + * 避免全局变量,应该使用类、命名空间进行范围限定。 + + * 注释质量: + + * 解释"为什么"而非"什么",复杂算法必须注释 + + * 对代码修改附近的注释检查注释是否需要同步修改 + + * 禁止使用不安全的C函数,例如`strcpy`, `strcat`, `strcmp`, `strlen`, `strchr`, `strrchr`, `strstr`, `sprintf`, `strtok`, `sscanf`, `strspn`, `strcspn`, `strpbrk`, `strncat`, `strncmp`, `strncpy`, `strcoll`, `strxfrm`, `strdup`, `strndup` + +8. **稳定性与监控** + + * 容量控制:所有缓冲区/队列设置上限,如`INT32_FLAG(max_send_log_group_size)` + + * 可观测性:缓存大小、延时、丢弃数等关键指标记录,异常情况使用日志记录,导致延时、丢数据的关键异常使用SendAlarm上报远程服务器。同时检查`LOG_INFO`/`LOG_WARNING`/`LOG_ERROR`日志是否有高频调用刷屏的风险。 + + * 自监控指标、告警:参考`../selfmonitor/SKILL.md`中的内容和规范进行检查。 + +9. **兼容性与部署** + + * 平台兼容:路径分隔符、字节序、系统调用差异处理 + + * 向后兼容:配置格式变更需要兼容旧版配置,新增参数应避免改变原有默认行为 + + * 配置默认值:新增配置项必须有合理的默认值,并在文档中说明 + + * 本地状态兼容:禁止使用Protobuf的`TextFormat`,避免新增参数dump后无法读取。旧版本dump的状态文件,新版本应该正常读取恢复。 + +10. **测试与质量** + + * 覆盖策略:单元测试应涵盖成功和失败路径,核心逻辑100%覆盖,边界条件必测 + + * 测试命名准确描述行为。 + + * 性能测试:对性能敏感的代码路径,应提供基准测试(benchmark) + +11. **安全与合规性**: + + * 检查配置和输入验证与清理以防注入攻击。 + + * 检查新增依赖库是否必要,新增时必须将License添加到licenses目录。 + + * 新文件包含Copyright和Apache License声明。 + + * 代码中严禁出现密钥泄露。 + +12. **文档:** + + * 对于新增的input、processor、flusher插件,检查是否新建了对应的使用文档。 + + * 对于改写的input、processor、flusher插件,如果GetXxxParam的参数有改动,需要对应修改使用文档。 + + +## 5. 按评估标准报告问题,需要输出 + +对发现的每个问题请按如下格式输出一个嵌套项: + +```markdown +- 文件: [<路径>:<起始行号>](file://./<路径>#L<起始行号>) + - 问题: <问题本质的一句话总结> + - 建议: <简明的修改建议或代码示例> +``` + +注意在输出行号前再次检索被review代码,确保使用精确的行号,以便 IDE 可以直接跳转。 + +## 6. 亮点总结,需要输出 + +在报告之后,用简短的列表形式总结你在差异中观察到的正面实践或良好实现。 + +整体过程中,请保持礼貌、专业的语气;保持评论尽可能简洁,同时不失清晰;并且确保仅分析真正发生变更的文件。 \ No newline at end of file diff --git a/.claude/skills/riper5-protocol/SKILL.md b/.claude/skills/riper5-protocol/SKILL.md new file mode 100644 index 0000000000..aea0984866 --- /dev/null +++ b/.claude/skills/riper5-protocol/SKILL.md @@ -0,0 +1,71 @@ +--- +name: riper5-protocol +description: RIPER-5 workflow protocol for complex software engineering tasks: Research, Innovate, Plan, Execute, Review. +--- +# RIPER-5 Protocol + +RIPER-5 is a 5-phase workflow designed for complex software engineering tasks: system design, architectural refactoring, bug diagnosis, performance optimization, multi-component integration. + +## Core Principle + +Start every new conversation in RESEARCH mode. Do not jump to solutions. Progress through phases only with explicit signals. + +## Modes + +### Mode 1: RESEARCH `[MODE: RESEARCH]` +**Purpose**: Information collection and deep understanding +**Allowed**: Read files, ask clarifying questions, analyze architecture, identify constraints, create task files +**Forbidden**: Suggestions, implementation, planning, any solution hints +**Output**: Start with `[MODE: RESEARCH]`, then only observations and questions. + +### Mode 2: INNOVATE `[MODE: INNOVATE]` +**Purpose**: Brainstorm potential approaches +**Allowed**: Discuss solution ideas, evaluate pros/cons, explore alternatives, document findings +**Forbidden**: Specific planning, implementation details, writing code, committing to solutions +**Output**: Start with `[MODE: INNOVATE]`, then only possibilities and considerations. + +### Mode 3: PLAN `[MODE: PLAN]` +**Purpose**: Create exhaustive technical specification +**Allowed**: Detailed plans with file paths, function signatures, data structure changes, error handling, dependency management, test approach +**Forbidden**: Any implementation or code writing, even "example code" that could be executed +**Required**: Convert entire plan into a numbered sequential checklist +**Output**: Start with `[MODE: PLAN]`, then only specifications and implementation details. + +### Mode 4: EXECUTE `[MODE: EXECUTE]` +**Purpose**: Implement exactly what was planned in Mode 3 +**Allowed**: Only implement what the approved plan explicitly details, follow checklist exactly, mark completed items, update task progress +**Forbidden**: Any deviation from plan, un-planned improvements, creative additions +**Quality**: Always show full code context, specify language and path, proper error handling +**Deviation**: If any deviation needed, immediately return to PLAN mode +**Entry**: Only enter on explicit "ENTER EXECUTE MODE" command + +### Mode 5: REVIEW `[MODE: REVIEW]` +**Purpose**: Ruthlessly verify implementation matches plan +**Required**: Line-by-line comparison, technical verification, check for bugs/unexpected behavior, verify against original requirements +**Report**: Must state if implementation matches plan exactly or deviates +**Format**: `Detected deviation: [exact description]` or `Implementation matches plan exactly` +**Output**: Start with `[MODE: REVIEW]`, then systematic comparison and clear judgment. + +## Critical Rules + +- Cannot transition between modes without explicit permission +- Must declare current mode at start of every response +- In EXECUTE: must follow plan 100% faithfully +- In REVIEW: must mark even the smallest deviation +- No independent decision authority outside declared mode +- Disable emoji output unless specifically requested +- If no explicit mode transition signal, stay in current mode +- Default: Start in RESEARCH mode + +## Mode Transition Signals + +Only transition on exact signals: +- "ENTER RESEARCH MODE" +- "ENTER INNOVATE MODE" +- "ENTER PLAN MODE" +- "ENTER EXECUTE MODE" +- "ENTER REVIEW MODE" + +**Auto-transitions**: +- If EXECUTE needs plan deviation -> return to PLAN mode +- After all implementation confirmed by user -> move to REVIEW mode diff --git a/.claude/skills/security-check/SKILL.md b/.claude/skills/security-check/SKILL.md new file mode 100644 index 0000000000..e9115105bb --- /dev/null +++ b/.claude/skills/security-check/SKILL.md @@ -0,0 +1,44 @@ +--- +name: security-check +description: Security scanning before commit/push. Checks for sensitive information like API keys and tokens. +--- +# Security Check Rules + +Before committing or pushing code, must check for sensitive information, especially API Keys and access tokens. + +## What to Check + +### API Keys and Access Tokens +- API Keys starting with `sk-` (OpenAI, Anthropic, Alibaba Cloud, etc.) +- Google API Keys starting with `AIzaSy` +- Public keys starting with `pk_` +- Other common API token formats + +## Before Commit + +### Run Check First +Run `bash .claude/skills/security-check/scripts/security_check.sh commit` to check the staging area for sensitive information. If it does NOT output `staging area is clear`, sensitive information was found. + +### If Sensitive Information Found +1. **Immediately delete or replace**: Replace real API Keys with placeholders +2. **Use environment variables**: Move sensitive info to environment variables +3. **Add to .gitignore**: Ensure files with sensitive info are not committed +4. **Must refuse the commit/push action** + +## Before Push + +### Run Check First +Run `bash .claude/skills/security-check/scripts/security_check.sh push` to check each commit for sensitive information. If it does NOT output `all commits are clear`, sensitive information was found. The commit hashes are written to `task/sensitive_commits.txt`. + +### If Sensitive Information Found +1. **Immediately delete or replace**: Replace real API Keys with placeholders +2. **Use environment variables**: Move sensitive info to environment variables +3. **Add to .gitignore**: Ensure files with sensitive info are not committed +4. **Must use the script below to clean history** + +```bash +# Reset based on results in task/sensitive_commits.txt to avoid leaking commits +bash .claude/skills/security-check/scripts/security_reset.sh +``` + +5. **Must refuse the commit/push action** diff --git a/.claude/skills/security-check/scripts/security_check.sh b/.claude/skills/security-check/scripts/security_check.sh new file mode 100755 index 0000000000..03848884fc --- /dev/null +++ b/.claude/skills/security-check/scripts/security_check.sh @@ -0,0 +1,40 @@ +#!/bin/bash +set -euo pipefail + +SENSITIVE_PATTERNS="(sk-[a-zA-Z0-9]{20,}|AIzaSy[a-zA-Z0-9_-]{30,}|pk_[a-zA-Z0-9]{10,}|ghp_[a-zA-Z0-9]{36,}|gho_[a-zA-Z0-9]{36,}|ghu_[a-zA-Z0-9]{36,}|ghs_[a-zA-Z0-9]{36,}|ghr_[a-zA-Z0-9]{36,})" +MODE="${1:-}" + +if [ "$MODE" != "commit" ] && [ "$MODE" != "push" ]; then + echo "Usage: $0 [commit|push]" + exit 2 +fi + +if [ "$MODE" == "commit" ]; then + # 检查暂存区中的 API Keys + echo "checking staging area" + if git diff --cached --no-prefix | grep '^+' | grep -E "$SENSITIVE_PATTERNS"; then + echo "⚠️ staging area contains SENSITIVE information" + else + echo "✅ staging area is clear" + fi +elif [ "$MODE" == "push" ]; then + # 检查所有要推送的 commit + is_clear=true + upstream=$(git rev-parse --abbrev-ref --symbolic-full-name @{u} 2>/dev/null) || upstream="origin/main" + mkdir -p task + > task/sensitive_commits.txt # 清空文件 + + while read -r commit; do + commit_hash=$(echo "$commit" | cut -d' ' -f1) + echo "checking commit: $commit" + if git show "$commit_hash" --no-commit-id --unified=0 | grep '^+' | grep -E "$SENSITIVE_PATTERNS"; then + echo "⚠️ commit $commit contains SENSITIVE information" + echo "$commit_hash" >> task/sensitive_commits.txt + is_clear=false + fi + echo "---" + done < <(git log "${upstream}"..HEAD --oneline) + if [ "$is_clear" = true ]; then + echo "✅ all commits are clear" + fi +fi \ No newline at end of file diff --git a/.claude/skills/security-check/scripts/security_reset.sh b/.claude/skills/security-check/scripts/security_reset.sh new file mode 100755 index 0000000000..7b93706899 --- /dev/null +++ b/.claude/skills/security-check/scripts/security_reset.sh @@ -0,0 +1,108 @@ +#!/bin/bash +# 智能squash脚本 - 自动检测并清理包含敏感信息的commits +echo "🔍 开始清理包含敏感信息的commits..." + +# 1. 检查task/sensitive_commits.txt文件是否存在且非空 +if [ ! -f "task/sensitive_commits.txt" ] || [ ! -s "task/sensitive_commits.txt" ]; then + echo "❌ 未找到敏感commits列表,请先运行push前检查" + exit 1 +fi + +# 读取敏感commits列表 +readarray -t sensitive_commits < task/sensitive_commits.txt + +# 2. 如果发现敏感信息,进行智能squash +if [ ${#sensitive_commits[@]} -gt 0 ]; then + echo "🚨 发现 ${#sensitive_commits[@]} 个包含敏感信息的commits,开始清理..." + + # 检查工作区是否干净 + git status --porcelain | read -r _ && { + echo "⚠️ 工作区或暂存区有未提交的更改,先进行stash..." + git stash push -u -m "security-cleanup-backup-$(date +%Y%m%d-%H%M%S)" + stashed=true + } || stashed=false + + # 获取要reset的目标commit + # 找到最早的敏感commit(数组最后一个),并获取其父commit + earliest_sensitive="${sensitive_commits[${#sensitive_commits[@]}-1]}" + parent_commit=$(git rev-parse --quiet "${earliest_sensitive}^") + + # 获取所有需要被squash的commits(从最早的敏感commit的parent到HEAD) + if [ -n "$parent_commit" ]; then + commits_to_squash=($(git rev-list --reverse "${parent_commit}..HEAD")) + else + # 如果没有parent,说明最早的敏感commit是root commit + echo "⚠️ 最早的敏感commit是仓库的第一个commit" + commits_to_squash=($(git rev-list --reverse HEAD)) + fi + + if [ ${#commits_to_squash[@]} -eq 0 ]; then + echo "❌ 无法确定要squash的commit范围" + if [ "$stashed" = true ]; then + git stash pop + fi + exit 1 + fi + + # 获取所有要重新提交的commits的信息 + echo "📝 提取所有commit messages..." + all_commit_details="" + main_subject="" + + for commit_hash in "${commits_to_squash[@]}"; do + # 获取commit信息 + subject=$(git log --format=%s -n 1 "$commit_hash") + body=$(git log --format=%b -n 1 "$commit_hash") + + # 主题行用第一个commit的主题 + if [ -z "$main_subject" ]; then + main_subject="$subject" + fi + + subject_marker="$subject" + + # 按GitHub squash格式添加commit详情 + if [ -n "$body" ]; then + all_commit_details="${all_commit_details}* ${subject_marker}\n\n${body}\n\n" + else + all_commit_details="${all_commit_details}* ${subject_marker}\n\n" + fi + done + + # 创建GitHub风格的squash commit message + new_message="${main_subject}\n\n${all_commit_details}" + + # 执行squash + echo "🔄 执行squash操作..." + if [ -n "$parent_commit" ]; then + git reset --soft "$parent_commit" + else + echo "❌ 检测到最早敏感 commit 为 root commit,自动清理会涉及高风险历史重写,已中止。" + echo "请手动执行更安全流程(例如 orphan 分支重建)后再提交。" + if [ "$stashed" = true ]; then + echo "⚠️ 已为你恢复之前的工作区更改。" + git stash pop + fi + exit 1 + fi + + # 显示需要手动清理的文件 + echo "📋 需要手动清理的文件:" + git status --porcelain | grep '^[AM]' | cut -c4- + + echo "" + echo "✅ Squash完成!请执行以下步骤:" + echo "1. 手动清理上述文件中的敏感信息" + echo "2. 运行: git add ." + echo "3. 运行: git commit" + if [ "$stashed" = true ]; then + echo "4. 如需恢复之前的工作区更改: git stash pop" + fi + echo "" + echo "📝 新的commit message预览:" + echo "────────────────────────────────────────" + echo -e "$new_message" + echo "────────────────────────────────────────" +else + echo "✅ 未发现包含敏感信息的commits" +fi diff --git a/.claude/skills/selfmonitor/SKILL.md b/.claude/skills/selfmonitor/SKILL.md new file mode 100644 index 0000000000..b146df7bdc --- /dev/null +++ b/.claude/skills/selfmonitor/SKILL.md @@ -0,0 +1,138 @@ +--- +name: selfmonitor +description: Self-monitoring metrics, alarm code standards for LoongCollector. Read when changes involve metrics, alarms, or observability. +--- +# Self-Monitoring Code Standards + +You are a self-monitoring code quality expert, responsible for ensuring LoongCollector code correctly uses self-monitoring features including metrics, alarms, code style, and implementation logic. + +## Metric Naming Conventions + +### Format + +**Variable name**: `{MODULE}_{METRIC_CONTENT_DESCRIPTION}_{UNIT}` (ALL CAPS) +**Variable content**: `{metric_content_description}_{unit}` (all lowercase) + +Example: +```cpp +const string METRIC_RUNNER_FLUSHER_IN_RAW_SIZE_BYTES = "in_raw_size_bytes"; +``` + +### Module Prefix Categories + +- **`agent_`**: Process-level metrics, describing entire Agent state +- **`pipeline_`**: Pipeline-level metrics, describing data pipeline state +- **`plugin_`**: Plugin-level metrics, describing specific plugin state +- **`component_`**: Component-level metrics, describing internal component state +- **`runner_`**: Runner-level metrics, describing runner state + +### Unit Categories + +#### Counter metrics +- **`_total`**: Cumulative count (default), e.g. `input_records_total`, `send_success_total` + +#### Size metrics +- **`_bytes`**: Bytes, e.g. `input_size_bytes`, `memory_used_bytes` +- **`_mb`**: Megabytes (memory), e.g. `agent_memory_used_mb` + +#### Time metrics +- **`_ms`**: Milliseconds (processing time, latency), e.g. `process_time_ms` +- **`_s`**: Seconds (long intervals), e.g. `uptime_s` + +#### Ratio metrics +- **`_percent`**: Percentage, e.g. `cpu_usage_percent` +- **`_ps`**: Per second (rate), e.g. `send_bytes_ps` + +#### State metrics +- **`_flag`**: Flag (0 or 1), e.g. `enabled_flag` +- **`_state`**: State value, e.g. `register_state` + +### Label Naming Conventions + +**Label Key format**: `METRIC_LABEL_KEY_{description}` + +Common keys: `METRIC_LABEL_KEY_PROJECT`, `METRIC_LABEL_KEY_LOGSTORE`, `METRIC_LABEL_KEY_PIPELINE_NAME`, `METRIC_LABEL_KEY_PLUGIN_TYPE`, `METRIC_LABEL_KEY_PLUGIN_ID`, `METRIC_LABEL_KEY_FILE_NAME`, `METRIC_LABEL_KEY_FILE_DEV`, `METRIC_LABEL_KEY_FILE_INODE`, `METRIC_LABEL_KEY_REGION`, `METRIC_LABEL_KEY_RUNNER_NAME` + +## Alarm Level Conventions + +Based on PR #2319 design, alarm levels: + +| Level | Severity | Description | Typical Scenario | +|-------|----------|-------------|------------------| +| 1 | warning | Single point error, doesn't affect overall flow | Data parse failure; single collection/send failure | +| 2 | error | Affects main flow, risk if not optimized | Queue busy; monitor exceeded; unsuccessful init | +| 3 | critical | Severe impact: config/module unusable; affects agent stability; causes customer loss | Config load failure; unsuccessful module init; data drop; crash | + +### C++ Alarm Usage + +**Correct**: +```cpp +AlarmManager::GetInstance()->SendAlarmWarning(LOGTAIL_CONFIG_ALARM, "配置解析失败"); +AlarmManager::GetInstance()->SendAlarmError(PROCESS_QUEUE_BUSY_ALARM, "处理队列繁忙"); +AlarmManager::GetInstance()->SendAlarmCritical(CATEGORY_CONFIG_ALARM, "配置加载失败"); +``` + +**Wrong**: Don't use old `SendAlarm` interface. + +### Go Alarm Usage + +**Correct**: +```go +logger.Warning(ctx, selfmonitor.CategoryConfigAlarm, "配置解析失败") +logger.Error(ctx, selfmonitor.ProcessQueueBusyAlarm, "处理队列繁忙") +logger.Critical(ctx, selfmonitor.CategoryConfigAlarm, "配置加载失败") +``` + +## Adding New Metrics + +### C++ Steps + +1. **Define metric constants**: Add to `core/monitor/metric_constants/MetricConstants.h` +2. **Create MetricsRecordRef** with labels in Init() +3. **Create metric objects** (CounterPtr, IntGaugePtr) BEFORE commit +4. **Update values** using macros: `ADD_COUNTER()`, `SET_GAUGE()`, `ADD_GAUGE()` + +**Critical**: MetricsRecordRef must create all metric objects BEFORE commit. After commit, no new metrics can be created. Use `IsCommitted()` to check state. If a Gauge default is non-zero, set it once during Init. + +### Go Steps + +1. **Define constants**: Add to `pkg/selfmonitor/metrics_constants_*.go` +2. **Register metrics** in `InitMetricRecord()`: + ```go + p.MetricRecord = p.Config.Context.RegisterMetricRecord(labels) + p.metricCounter = selfmonitor.NewCounterMetricAndRegister(p.MetricRecord, selfmonitor.MetricPluginInEventsTotal) + ``` +3. **Update values**: Check nil before updating. + +## Adding New Alarm Types + +### C++ Steps + +1. Add to `core/monitor/AlarmManager.h` enum `AlarmType` +2. Add to `mMessageType` vector in `AlarmManager.cpp` constructor +3. Use leveled interfaces: `SendAlarmWarning`, `SendAlarmError`, `SendAlarmCritical` + +### Go Steps + +1. Add to `pkg/selfmonitor/alarm_constants.go` +2. Use leveled interfaces: `logger.Warning`, `logger.Error`, `logger.Critical` + +## Best Practices + +1. Create metric objects once during initialization, not per-call +2. Use safe update macros that check for null +3. Choose alarm level matching severity +4. Provide meaningful alarm messages with context +5. Avoid alarm storms - limit frequency of same alarm +6. Metrics should not impact main flow performance + +## Checklist + +Before submitting self-monitoring code: +- [ ] Metric names follow naming convention with correct module prefix and unit +- [ ] Labels follow naming convention +- [ ] Correct alarm level interface used, matching severity +- [ ] No deprecated interfaces used +- [ ] Metrics created once, updated safely +- [ ] Alarm storms avoided +- [ ] Error handling complete diff --git a/.claude/skills/testing-standards/SKILL.md b/.claude/skills/testing-standards/SKILL.md new file mode 100644 index 0000000000..f1c3618e89 --- /dev/null +++ b/.claude/skills/testing-standards/SKILL.md @@ -0,0 +1,99 @@ +--- +name: testing-standards +description: Testing standards for LoongCollector: unit tests, e2e tests, benchmarks. Reference when writing or reviewing tests. +--- +# LoongCollector Testing Standards + +## Test Categories + +### 1. Unit Tests (C++) +- Use Google Test (GTest) / Google Mock +- Place in `core/unittest/` +- Cover success and failure paths +- Core logic must have 100% coverage +- Test boundary conditions explicitly +- Test naming: accurately describe behavior being tested +- Each `core/unittest/*/` directory produces one executable +- Build and run tests from inside `build/` to ensure relative paths and temp files work correctly +- See `.claude/skills/compile/SKILL.md` for build & run instructions + +### 2. Unit Tests (Go) +- Use standard `testing` package +- Table-driven tests for function coverage +- Integration tests via E2E framework + +### 3. E2E Tests +- BDD Godog framework +- Configuration-driven via `.feature` files +- See `.claude/skills/e2e/SKILL.md` for complete guide (design → write → run → debug) + +### 4. Benchmarks +- Required for performance-sensitive code paths +- Compare against baseline versions +- Measure throughput, latency, CPU/Memory usage + +## E2E Test Quick Reference + +### Feature File Structure +``` +@input +Feature: input file + Test input file + + @e2e @host + Scenario: TestInputFileWithRegexSingle + Given {host} environment + Given subcribe data from {sls} with config + """ + enable: true + inputs: + - Type: input_file + """ + When generate {100} regex logs to file {/tmp/loongcollector/regex_single.log}, with interval {100}ms + Then there is {100} logs +``` + +### Behavior Types +| Type | Purpose | +|------|---------| +| `Given` | Setup/prepare test conditions | +| `When` | Trigger test actions (e.g., log generation) | +| `Then` | Verify test results | + +### Environment Tags +- `@host` - Host environment +- `@k8s` - Kubernetes environment +- `@docker-compose` - Docker Compose environment +- `@e2e` - E2E test marker +- `@regression` - Regression test marker + +### Adding New Test Behaviors +1. Write the Go function in the appropriate directory: + - `cleanup/` - Post-test cleanup (auto-executed) + - `control/` - Control operations (init, config) + - `setup/` - Environment setup + - `trigger/` - Data generation + - `verify/` - Result verification +2. Function signature: `func Name(ctx context.Context, params...) (context.Context, error)` +3. Register in `test/e2e_enterprise/main_test.go` via `scenarioInitializer` +4. Use in feature files with `{param}` syntax + +### Strict Rules +- Do NOT change behavior of the method being tested +- Do NOT modify existing test behaviors in engine +- Always start trigger `When begin trigger` BEFORE generating logs +- Only use registered behaviors from `test/engine/steps.go` +- Verify behavior type matches (Given/When/Then) + +### Test Naming +- Format: `Test${FunctionName}${CaseBriefDescription}` +- Examples: `TestInputFileWithBlackListDir`, `TestInputFileWithRegexSingle` +- Must include `@e2e` and environment tags + +## Benchmark Testing + +For performance-sensitive code: +1. Provide baseline comparison +2. Measure: throughput, latency, CPU profile, memory profile +3. Run under realistic load conditions +4. Document methodology and results diff --git a/.cursor/rules/project-knowledge/config-pitfalls.mdc b/.cursor/rules/project-knowledge/config-pitfalls.mdc new file mode 100644 index 0000000000..2aa3c16b0c --- /dev/null +++ b/.cursor/rules/project-knowledge/config-pitfalls.mdc @@ -0,0 +1,41 @@ +--- +description: LoongCollector 采集配置常见陷阱。编写或审查 pipeline config YAML 时参考。 +globs: + - "**/*.feature" + - "**/case.feature" + - "core/config/**" + - "test/e2e/**" +alwaysApply: false +--- +# LoongCollector 采集配置陷阱 + +## ExcutionTimeout 使配置变为一次性(onetime) + +`global.ExcutionTimeout` 存在于配置中时,**整个配置**被标记为 onetime 类型。 +只有注册了 `RegisterOnetimeInputCreator` 的插件才能在 onetime 配置中使用。 + +大部分输入插件(`input_forward`, `input_file`, `input_container_stdio`, `input_prometheus` 等)只注册了 `RegisterContinuousInputCreator`,在 onetime 配置中会报错: + +``` +failed to parse config:unsupported input plugin module:input_forward +``` + +### 判断逻辑 + +``` +global.ExcutionTimeout 存在 + → PipelineConfig::GetExpireTimeIfOneTime → mOnetimeExpireTime 被设置 + → CollectionConfig::IsOnetime() == true + → IsValidNativeInputPlugin(name, true) 在 ONETIME 注册表查找 + → 找不到 → "unsupported input plugin" +``` + +### 支持 onetime 的输入插件 + +查看 `PluginRegistry::LoadStaticPlugins()` 中调用 `RegisterOnetimeInputCreator` 的插件,如 `InputStaticFile`。 + +### 规则 + +- **持续运行的输入插件配置中不要使用 `ExcutionTimeout`** +- E2E 测试不需要 `ExcutionTimeout` 来控制超时,Go test 的 `-timeout` 参数已经提供了保护 +- 如果确实需要一次性采集,使用 `onetime_pipeline_config` 目录 + 支持 onetime 的输入插件 diff --git a/.cursor/skills/compile/SKILL.md b/.cursor/skills/compile/SKILL.md index 1e7c938e62..a9b9c7af02 100644 --- a/.cursor/skills/compile/SKILL.md +++ b/.cursor/skills/compile/SKILL.md @@ -8,44 +8,89 @@ description: Building ### C++ 部分编译方法 -1. 判断是否进行增量编译。如果已有 `build` 目录,并且其中有内容,并且你的修改没有涉及到 CMake 相关文件,那么跳转到第5步进行增量编译。 +**重要:所有 CMake 和 make 命令必须在 `build/` 目录内执行。** -2. 创建编译目录 - -``` bash -mkdir -p build +**前置条件** — 首次编译前需初始化 Git 子模块: +```bash +git submodule update --init --recursive ``` +两个子模块位于 `core/_thirdparty/`: +- `DCGM` — NVIDIA DCGM 头文件 +- `coolbpf` — eBPF 框架 + +如果子模块目录为空,编译会报 `No such file or directory` 错误。 + +#### 编译步骤 -3. 进入编译目录 +1. 判断是否进行增量编译。如果已有 `build` 目录,并且其中有内容,并且你的修改没有涉及到 CMake 相关文件,那么跳转到第 4 步进行增量编译。 -``` bash -cd build +2. 创建并进入编译目录 + +```bash +mkdir -p build && cd build ``` -4. 构建 CMake 命令 +3. 构建 CMake 命令 -``` bash +```bash cmake -DCMAKE_BUILD_TYPE=Debug -DLOGTAIL_VERSION=0.0.1 \ -DCMAKE_EXPORT_COMPILE_COMMANDS=1 \ -DCMAKE_CXX_FLAGS="-I/opt/rh/devtoolset-9/root/usr/lib/gcc/x86_64-redhat-linux/9/include -I/opt/logtail -I/opt/logtail_spl" \ - -DBUILD_LOGTAIL=ON -DBUILD_LOGTAIL_UT=ON -DWITHOUTGDB=ON -DENABLE_STATIC_LINK_CRT=ON -DWITHSPL=ON ../core + -DBUILD_LOGTAIL=ON -DBUILD_LOGTAIL_UT=ON -DWITHOUTGDB=ON -DENABLE_STATIC_LINK_CRT=ON -DWITHSPL=OFF ../core ``` -注意其中的几个开关: - - BUILD_LOGTAIL:表示编译 LoongCollector 二进制。必选 - - BUILD_LOGTAIL_UT:表示编译 LoongCollector 单测。仅当你修改了 LoongCollector 单测时才打开。 - - WITHSPL:表示编译 LoongCollector SPL 相关内容。仅当你修改了 LoongCollector SPL 相关文件时才打开。 +关键 CMake 开关: + +| 开关 | 用途 | +|------|------| +| `BUILD_LOGTAIL` | 编译 LoongCollector 二进制。必选。 | +| `BUILD_LOGTAIL_UT` | 编译单元测试。修改了测试代码时打开。 | +| `WITHSPL` | SPL 支持。除非修改了 SPL 相关文件,否则设为 `OFF`。 | -5. 编译 +4. 编译 -``` bash +```bash make -sj$(nproc) ``` -### Go 部分编译方法 +#### C++ 单元测试 + +每个 `core/unittest/*/` 下的测试目录会生成独立的可执行文件。 + +**编译指定测试**(在 `build/` 目录内): +```bash +make yaml_util_unittest app_config_unittest safe_queue_unittest -j$(nproc) +``` + +**运行测试**(在 `build/` 目录内): +```bash +./unittest/common/yaml_util_unittest +./unittest/app_config/app_config_unittest +``` -执行 +测试必须在 `build/` 目录内运行,因为部分测试依赖相对路径加载配置文件。 -``` bash +### Go 部分编译方法 + +```bash make plugin_local ``` + +### Docker 构建 + +```bash +make image +``` + +### 交叉编译 + +ARM64 架构: +```bash +make image ARCH=arm64 +``` + +### 常见问题 + +- 如果 CMake 报缺少依赖,通过 `apt` 或 `yum` 安装 +- 如果链接失败,尝试 `make clean` 后重新构建 +- 需要 SPL 相关功能时,将 `WITHSPL=OFF` 改为 `WITHSPL=ON` diff --git a/.cursor/skills/e2e/SKILL.md b/.cursor/skills/e2e/SKILL.md new file mode 100644 index 0000000000..92a8c4df6b --- /dev/null +++ b/.cursor/skills/e2e/SKILL.md @@ -0,0 +1,209 @@ +--- +name: e2e +description: LoongCollector E2E 测试全流程指南:设计、编写、运行和调试。当需要编写新 E2E 测试、运行现有测试、或排查 E2E 测试失败时使用此 skill。 +--- +# LoongCollector E2E 测试指南 + +> 详细步骤模板见 [reference.md](reference.md) | 可复用脚本见 [scripts/](scripts/) + +## 目录 + +1. [概览](#1-概览) +2. [设计测试用例](#2-设计测试用例) +3. [编写测试用例](#3-编写测试用例) +4. [本地运行(docker-compose)](#4-本地运行) +5. [调试](#5-调试) +6. [已知陷阱](#6-已知陷阱) + +--- + +## 1 概览 + +基于 **BDD Godog** 框架,通过 `.feature` 文件描述场景,引擎正则匹配步骤函数并传参。 + +``` +test/e2e/ + test_cases// + case.feature # 场景描述 + docker-compose.yaml # 可选,外部依赖服务 + engine/ + steps.go # 所有可用步骤(权威来源) + setup/ control/ trigger/ verify/ cleanup/ +``` + +**环境 tag**:`@host`、`@k8s`、`@docker-compose`(三选一,加 `@e2e`) + +--- + +## 2 设计测试用例 + +编写 feature 文件前,先确定测试矩阵。按以下维度逐项评估是否需要覆盖: + +### 2.1 场景维度清单 + +| 维度 | 典型场景 | 何时需要 | +|------|----------|----------| +| **基础功能** | 单配置、单数据类型端到端 | 必须 | +| **多数据类型** | logs / metrics / traces 分别验证 | 插件支持多类型时 | +| **多配置共存** | 同时加载多个 pipeline 配置 | 涉及端口/资源竞争时 | +| **配置热加载** | 运行中增/删/改配置 | 持续运行的 input 插件 | +| **配置类型变更** | 从 A 类型切换到 B 类型 | 插件支持多协议/格式时 | +| **反压与恢复** | 下游不可达 → 恢复后数据不丢 | flusher 插件 | +| **外部依赖失效** | 依赖服务重启/不可达 | 有外部依赖时 | +| **大数据量** | 高吞吐压力下不 OOM/不丢数据 | 性能敏感路径 | + +### 2.2 设计产出 + +确定要覆盖的场景后,明确每个 Scenario 的: +- **输入**:什么数据、什么格式、多少条 +- **流经路径**:input → processor → flusher 的具体插件 +- **预期输出**:在哪里验证、验证什么 +- **外部依赖**:需要什么辅助服务(OTel Collector、Kafka 等) + +--- + +## 3 编写测试用例 + +### 3.1 目录结构 + +``` +test/e2e/test_cases/my_feature/ +├── case.feature +├── docker-compose.yaml # 外部依赖 +└── otel-collector-config.yaml # 如果用 OTel Collector +``` + +### 3.2 Feature 文件模板 + +```gherkin +@flusher +Feature: my feature name + Brief description + + @e2e @docker-compose + Scenario: TestMyFeatureLogs + Given {docker-compose} environment + Given {my-config} local config as below + """ + enable: true + inputs: + - Type: input_forward + Protocol: OTLP + Endpoint: "0.0.0.0:4320" + flushers: + - Type: flusher_otlp_native + Endpoint: "otel-collector:4317" + """ + When start docker-compose {my_feature} + Then wait {10} seconds + When generate {1} OTLP {logs} via otelgen to endpoint {loongcollectorC:4320}, protocol {grpc} + Then wait {5} seconds + Then otlp collector received at least {1} logs from file {/tmp/otel-export/logs.json} +``` + +### 3.3 强制规则 + +- 配置中必须含 `enable: true` +- **只使用** `test/engine/steps.go` 中已注册的步骤 +- `wait {N} seconds` 是 **Then** 类型,不是 When +- 命名格式:`Test${功能名}${场景描述}` +- **不要**在持续运行插件的配置中使用 `global.ExcutionTimeout`(见 §6.1) + +### 3.4 扩展步骤 + +如需新步骤,参考 [reference.md §扩展步骤](reference.md) 中的开发和注册流程。 + +--- + +## 4 本地运行 + +### 4.1 前置条件 + +```bash +docker --version && docker compose version +``` + +如修改了 C++ 代码,需重新编译并更新镜像。两种方式: + +**方式一:完整构建**(慢,但保证一致) +```bash +make e2e_image # 从源码构建完整 Docker 镜像 aliyun/loongcollector:0.0.1 +``` + +**方式二:增量更新**(快,适合迭代调试) +```bash +cd build && make -sj$(nproc) && cd .. +# 替换镜像中的二进制 +docker create --name tmp-lc aliyun/loongcollector:0.0.1 +docker cp build/loongcollector tmp-lc:/usr/local/loongcollector/loongcollector +docker commit tmp-lc aliyun/loongcollector:0.0.1 +docker rm tmp-lc +``` + +### 4.2 运行 + +```bash +cd test/e2e + +# 运行整个测试用例(所有 Scenario) +TEST_CASE=flusher_otlp_native go test -v -run "TestE2EOnDockerCompose$" \ + -timeout 600s -count=1 ./... + +# 只运行指定 Scenario +TEST_CASE=flusher_otlp_native go test -v \ + -run "TestE2EOnDockerCompose/TestFlusherOTLPNativeLogs$" \ + -timeout 600s -count=1 ./... +``` + +### 4.3 清理(测试失败后必做) + +可以直接运行脚本 `bash .cursor/skills/e2e/scripts/e2e-cleanup.sh`,或手动执行: + +```bash +docker rm -f $(docker ps -aq) 2>/dev/null +docker network prune -f +rm -rf test/e2e/config test/e2e/onetime_pipeline_config +sudo rm -rf test/e2e/report +rm -f test/e2e/test_cases//testcase-compose.yaml +``` + +--- + +## 5 调试 + +```bash +# 1. 查看容器日志 +docker ps | grep loongcollectorC +docker exec cat /usr/local/loongcollector/log/loongcollector.LOG + +# 2. 检查配置是否加载 +docker exec ls /usr/local/loongcollector/conf/continuous_pipeline_config/local/ + +# 3. 检查端口是否监听 +docker exec ss -tlnp | grep + +# 4. 手动复现 compose 环境 +cd test/e2e/test_cases/ +docker compose -f testcase-compose.yaml up -d +docker compose -f testcase-compose.yaml logs -f loongcollectorC +``` + +--- + +## 6 已知陷阱 + +### 6.1 ExcutionTimeout 使配置变为一次性 + +**绝对不要**在 `input_forward`、`input_file` 等持续插件的配置中使用 `global.ExcutionTimeout`。 + +它会使 `IsOnetime()` 返回 true,导致 `IsValidNativeInputPlugin(name, true)` 在 onetime 注册表中查找,而大部分 input 只注册了 continuous,结果报 `unsupported input plugin`。 + +详见 `.cursor/rules/project-knowledge/config-pitfalls.mdc`。 + +### 6.2 FlusherFile 必须是文件 + +e2e 模板将 `report/default_flusher.json` bind-mount 到容器。若宿主机路径不存在,Docker 会创建为**目录**。已在 `BootController.Start()` 中自动处理。 + +### 6.3 测试间残留 + +多 Scenario 共享进程,`Clean()` 会删除 config/report。异常退出后手动清理(§4.3)。 diff --git a/.cursor/skills/e2e/reference.md b/.cursor/skills/e2e/reference.md new file mode 100644 index 0000000000..d1c662ae35 --- /dev/null +++ b/.cursor/skills/e2e/reference.md @@ -0,0 +1,134 @@ +# E2E 测试详细参考 + +## 可用步骤速查 + +> 权威来源:`test/engine/steps.go` + +### Given(环境准备) + +| 步骤模板 | 说明 | +|----------|------| +| `{docker-compose} environment` | 初始化 docker-compose 环境 | +| `{host} environment` | 初始化主机环境 | +| `{daemonset} environment` | 初始化 K8s 环境 | +| `{name} local config as below` | 写入持续采集配置 | +| `{name} onetime pipeline local config as below` | 写入一次性采集配置 | +| `subcribe data from {sls} with config` | 订阅 SLS 数据源 | +| `loongcollector depends on containers {name}` | 设置容器依赖 | +| `loongcollector container mount {src} to {dst}` | 挂载卷 | +| `loongcollector expose port {host} to {container}` | 暴露端口 | +| `docker-compose boot type {type}` | 设置 boot 类型 | +| `mkdir {path}` | 创建目录 | + +### When(触发动作) + +| 步骤模板 | 说明 | +|----------|------| +| `start docker-compose {case_name}` | 启动 docker-compose 环境 | +| `begin trigger` | 标记触发开始时间(生成日志前必须调用) | +| `generate {N} regex logs to file {path}, with interval {M}ms` | 生成正则日志 | +| `generate {N} json logs to file {path}, with interval {M}ms` | 生成 JSON 日志 | +| `generate {N} apsara logs to file {path}, with interval {M}ms` | 生成 Apsara 日志 | +| `generate {N} OTLP {logs\|metrics\|traces} via otelgen to endpoint {ep}, protocol {grpc\|http}` | 生成 OTLP 数据 | +| `generate {N} http logs, with interval {M}ms, url: {url}, method: {method}, body:` | 生成 HTTP 日志 | +| `execute {N} commands {cmd} in sequence` | 顺序执行命令 | +| `execute {N} commands {cmd} in parallel` | 并行执行命令 | +| `create the shell script file {name} with the following content` | 创建 shell 脚本 | +| `execute {N} the shell script file {name} in parallel` | 并行执行 shell 脚本 | +| `restart agent` | 重启 Agent | +| `force restart agent` | 强制重启 Agent | + +### Then(结果验证) + +| 步骤模板 | 说明 | +|----------|------| +| `there is {N} logs` | 精确验证日志数(上限 100) | +| `there is at least {N} logs` | 最少日志数验证 | +| `there is less than {N} logs` | 最多日志数验证 | +| `the log fields match kv` | KV 字段匹配(文档内容跟 `"""..."""`) | +| `the log fields match as below` | 日志字段模式匹配 | +| `the log tags match kv` | Tag KV 匹配 | +| `the log is in order` | 日志顺序验证 | +| `wait {N} seconds` | 等待 N 秒 | +| `otlp collector received at least {N} (logs\|metrics\|traces) from file {path}` | OTel Collector 数据验证 | + +> 注意:日志数量验证上限 100。超过 100 用 `When query through` + `Then the log fields match kv` 方式。 + +--- + +## 扩展步骤 + +### 1. 编写函数 + +在 `test/engine/` 对应子目录下: + +```go +func MyVerification(ctx context.Context, expected int) (context.Context, error) { + // 实现逻辑 + return ctx, nil +} +``` + +签名要求:第一个参数 `context.Context`,返回 `(context.Context, error)`。 + +### 2. 注册 + +在 `test/engine/steps.go` 中: + +```go +ctx.Then(`^my verification expects \{(\d+)\}$`, verify.MyVerification) +``` + +### 3. 使用 + +```gherkin +Then my verification expects {42} +``` + +--- + +## docker-compose.yaml 示例 + +### OTel Collector(OTLP 测试用) + +```yaml +services: + otel-collector: + image: otel/opentelemetry-collector-contrib:latest + hostname: otel-collector + user: "0:0" + ports: + - "4317" + volumes: + - ./otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml + - ./otel-export:/tmp/otel-export + healthcheck: + test: ["CMD", "wget", "--spider", "-q", "http://localhost:13133/"] + interval: 5s + timeout: 3s + retries: 5 + start_period: 10s +``` + +--- + +## eBPF 进程安全测试示例 + +```gherkin +@e2e @host @ebpf_input +Scenario: TestEBPFProcessSecurityByNormalStart + Given {host} environment + Given subcribe data from {sls} with config + """ + """ + Given {ebpf_process_security_default} local config as below + """ + enable: true + inputs: + - Type: input_process_security + """ + When begin trigger + When execute {1} commands {/bin/echo 1} in sequence + When query through {* | select * from e2e where call_name = 'execve' and binary = '/bin/echo' and arguments = '1'} + Then there is {1} logs +``` diff --git a/.cursor/skills/e2e/scripts/e2e-cleanup.sh b/.cursor/skills/e2e/scripts/e2e-cleanup.sh new file mode 100755 index 0000000000..ff61e91eb1 --- /dev/null +++ b/.cursor/skills/e2e/scripts/e2e-cleanup.sh @@ -0,0 +1,32 @@ +#!/usr/bin/env bash +# E2E 测试环境清理脚本 +# 用法: bash .cursor/skills/e2e/scripts/e2e-cleanup.sh [case_name] +set -euo pipefail + +REPO_ROOT="$(git rev-parse --show-toplevel)" +E2E_DIR="$REPO_ROOT/test/e2e" +CASE_NAME="${1:-}" + +echo "==> 停止并删除所有 Docker 容器..." +docker rm -f $(docker ps -aq) 2>/dev/null || true + +echo "==> 清理 Docker 网络..." +docker network prune -f 2>/dev/null || true + +echo "==> 清理运行时目录..." +rm -rf "$E2E_DIR/config" "$E2E_DIR/onetime_pipeline_config" +sudo rm -rf "$E2E_DIR/report" 2>/dev/null || rm -rf "$E2E_DIR/report" 2>/dev/null || true + +if [[ -n "$CASE_NAME" ]]; then + CASE_DIR="$E2E_DIR/test_cases/$CASE_NAME" + if [[ -d "$CASE_DIR" ]]; then + echo "==> 清理测试用例 $CASE_NAME..." + rm -f "$CASE_DIR/testcase-compose.yaml" + rm -f "$CASE_DIR/otel-export/"*.json 2>/dev/null || true + fi +else + echo "==> 清理所有测试用例的 testcase-compose.yaml..." + find "$E2E_DIR/test_cases" -name "testcase-compose.yaml" -delete 2>/dev/null || true +fi + +echo "==> 清理完成" diff --git a/.cursor/skills/security-check/scripts/security_check.sh b/.cursor/skills/security-check/scripts/security_check.sh index 455023c35f..03848884fc 100644 --- a/.cursor/skills/security-check/scripts/security_check.sh +++ b/.cursor/skills/security-check/scripts/security_check.sh @@ -1,7 +1,7 @@ #!/bin/bash set -euo pipefail -SENSITIVE_PATTERNS="(sk-[a-zA-Z0-9]|AIzaSy[a-zA-Z0-9]|pk_[a-zA-Z0-9]|ghp_[a-zA-Z0-9]|gho_[a-zA-Z0-9]|ghu_[a-zA-Z0-9]|ghs_[a-zA-Z0-9]|ghr_[a-zA-Z0-9])" +SENSITIVE_PATTERNS="(sk-[a-zA-Z0-9]{20,}|AIzaSy[a-zA-Z0-9_-]{30,}|pk_[a-zA-Z0-9]{10,}|ghp_[a-zA-Z0-9]{36,}|gho_[a-zA-Z0-9]{36,}|ghu_[a-zA-Z0-9]{36,}|ghs_[a-zA-Z0-9]{36,}|ghr_[a-zA-Z0-9]{36,})" MODE="${1:-}" if [ "$MODE" != "commit" ] && [ "$MODE" != "push" ]; then diff --git a/.devcontainer/Dockerfile b/.devcontainer/Dockerfile index 7e98b005ca..95f461da61 100644 --- a/.devcontainer/Dockerfile +++ b/.devcontainer/Dockerfile @@ -18,9 +18,18 @@ ARG USERNAME=admin ARG USER_PASSWORD USER root +RUN sed -i '/mirrors.aliyuncs.com\|mirrors.cloud.aliyuncs.com/d' /etc/yum.repos.d/CentOS-Base.repo RUN yum -y install openssh-server && \ ssh-keygen -A +# Feature Docker-in-Docker +COPY dind-install.sh /tmp/dind-install.sh +RUN chmod +x /tmp/dind-install.sh && \ + MOBY=false \ + DOCKERDASHCOMPOSEVERSION=none \ + INSTALLDOCKERBUILDX=false \ + /tmp/dind-install.sh + # Create the user COPY .env /tmp/.env COPY authorized_keys /tmp/authorized_keys @@ -51,7 +60,9 @@ RUN cp /opt/logtail/deps/lib/libssl.so.1.0.0 /usr/lib64; \ echo "export PATH=/usr/local/go/bin:/opt/logtail/deps/bin:$PATH" >> /home/$USERNAME/.bashrc; \ su - $USERNAME -c "\ go env -w GO111MODULE=on && \ - go env -w GOPROXY=https://goproxy.cn,direct" + go env -w GOPROXY=https://goproxy.cn,direct" && \ + usermod -aG docker $USERNAME USER $USERNAME +# ENTRYPOINT [ "/usr/local/share/docker-init.sh" ] \ No newline at end of file diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json index 76caba5e57..7709d22788 100644 --- a/.devcontainer/devcontainer.json +++ b/.devcontainer/devcontainer.json @@ -9,13 +9,15 @@ "privileged": true, "mounts": [ { "source": "/sys", "target": "/sys", "type": "bind" }, - { "source": "/", "target": "/logtail_host", "type": "bind" } + { "source": "/", "target": "/logtail_host", "type": "bind" }, + { "source": "loongcollector-dind-data", "target": "/var/lib/docker", "type": "volume" } ], "runArgs": [ "--cap-add=SYS_PTRACE", "--security-opt", "seccomp=unconfined" ], "onCreateCommand": "sudo chown -R $(id -un):$(id -gn) /root", + "postStartCommand": "sudo bash ${containerWorkspaceFolder}/.devcontainer/start-dind.sh", "postCreateCommand": "sudo /usr/sbin/sshd", "customizations": { "vscode": { diff --git a/.devcontainer/dind-install.sh b/.devcontainer/dind-install.sh new file mode 100644 index 0000000000..d364880676 --- /dev/null +++ b/.devcontainer/dind-install.sh @@ -0,0 +1,1022 @@ +#!/usr/bin/env bash +#------------------------------------------------------------------------------------------------------------- +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. See https://go.microsoft.com/fwlink/?linkid=2090316 for license information. +#------------------------------------------------------------------------------------------------------------- +# +# Docs: https://github.com/microsoft/vscode-dev-containers/blob/main/script-library/docs/docker-in-docker.md +# Maintainer: The Dev Container spec maintainers + + +DOCKER_VERSION="${VERSION:-"latest"}" # The Docker/Moby Engine + CLI should match in version +USE_MOBY="${MOBY:-"true"}" +MOBY_BUILDX_VERSION="${MOBYBUILDXVERSION:-"latest"}" +DOCKER_DASH_COMPOSE_VERSION="${DOCKERDASHCOMPOSEVERSION:-"v2"}" #v1, v2 or none +AZURE_DNS_AUTO_DETECTION="${AZUREDNSAUTODETECTION:-"true"}" +DOCKER_DEFAULT_ADDRESS_POOL="${DOCKERDEFAULTADDRESSPOOL:-""}" +USERNAME="${USERNAME:-"${_REMOTE_USER:-"automatic"}"}" +INSTALL_DOCKER_BUILDX="${INSTALLDOCKERBUILDX:-"true"}" +INSTALL_DOCKER_COMPOSE_SWITCH="${INSTALLDOCKERCOMPOSESWITCH:-"false"}" +MICROSOFT_GPG_KEYS_URI="https://packages.microsoft.com/keys/microsoft.asc" +MICROSOFT_GPG_KEYS_ROLLING_URI="https://packages.microsoft.com/keys/microsoft-rolling.asc" +DOCKER_MOBY_ARCHIVE_VERSION_CODENAMES="trixie bookworm buster bullseye bionic focal jammy noble" +DOCKER_LICENSED_ARCHIVE_VERSION_CODENAMES="trixie bookworm buster bullseye bionic focal hirsute impish jammy noble" +DISABLE_IP6_TABLES="${DISABLEIP6TABLES:-false}" + +# Default: Exit on any failure. +set -e + +# Clean up +rm -rf /var/lib/apt/lists/* + +# Setup STDERR. +err() { + echo "(!) $*" >&2 +} + +if [ "$(id -u)" -ne 0 ]; then + err 'Script must be run as root. Use sudo, su, or add "USER root" to your Dockerfile before running this script.' + exit 1 +fi + +################### +# Helper Functions +# See: https://github.com/microsoft/vscode-dev-containers/blob/main/script-library/shared/utils.sh +################### + +# Determine the appropriate non-root user +if [ "${USERNAME}" = "auto" ] || [ "${USERNAME}" = "automatic" ]; then + USERNAME="" + POSSIBLE_USERS=("vscode" "node" "codespace" "$(awk -v val=1000 -F ":" '$3==val{print $1}' /etc/passwd)") + for CURRENT_USER in "${POSSIBLE_USERS[@]}"; do + if id -u ${CURRENT_USER} > /dev/null 2>&1; then + USERNAME=${CURRENT_USER} + break + fi + done + if [ "${USERNAME}" = "" ]; then + USERNAME=root + fi +elif [ "${USERNAME}" = "none" ] || ! id -u ${USERNAME} > /dev/null 2>&1; then + USERNAME=root +fi + +# Package manager update function +pkg_mgr_update() { + case ${ADJUSTED_ID} in + debian) + if [ "$(find /var/lib/apt/lists/* | wc -l)" = "0" ]; then + echo "Running apt-get update..." + apt-get update -y + fi + ;; + rhel) + if [ ${PKG_MGR_CMD} = "microdnf" ]; then + cache_check_dir="/var/cache/yum" + else + cache_check_dir="/var/cache/${PKG_MGR_CMD}" + fi + if [ "$(ls ${cache_check_dir}/* 2>/dev/null | wc -l)" = 0 ]; then + echo "Running ${PKG_MGR_CMD} makecache ..." + ${PKG_MGR_CMD} makecache + fi + ;; + esac +} + +# Checks if packages are installed and installs them if not +check_packages() { + case ${ADJUSTED_ID} in + debian) + if ! dpkg -s "$@" > /dev/null 2>&1; then + pkg_mgr_update + apt-get -y install --no-install-recommends "$@" + fi + ;; + rhel) + if ! rpm -q "$@" > /dev/null 2>&1; then + pkg_mgr_update + ${PKG_MGR_CMD} -y install "$@" + fi + ;; + esac +} + +# Figure out correct version of a three part version number is not passed +find_version_from_git_tags() { + local variable_name=$1 + local requested_version=${!variable_name} + if [ "${requested_version}" = "none" ]; then return; fi + local repository=$2 + local prefix=${3:-"tags/v"} + local separator=${4:-"."} + local last_part_optional=${5:-"false"} + if [ "$(echo "${requested_version}" | grep -o "." | wc -l)" != "2" ]; then + local escaped_separator=${separator//./\\.} + local last_part + if [ "${last_part_optional}" = "true" ]; then + last_part="(${escaped_separator}[0-9]+)?" + else + last_part="${escaped_separator}[0-9]+" + fi + local regex="${prefix}\\K[0-9]+${escaped_separator}[0-9]+${last_part}$" + local version_list="$(git ls-remote --tags ${repository} | grep -oP "${regex}" | tr -d ' ' | tr "${separator}" "." | sort -rV)" + if [ "${requested_version}" = "latest" ] || [ "${requested_version}" = "current" ] || [ "${requested_version}" = "lts" ]; then + declare -g ${variable_name}="$(echo "${version_list}" | head -n 1)" + else + set +e + declare -g ${variable_name}="$(echo "${version_list}" | grep -E -m 1 "^${requested_version//./\\.}([\\.\\s]|$)")" + set -e + fi + fi + if [ -z "${!variable_name}" ] || ! echo "${version_list}" | grep "^${!variable_name//./\\.}$" > /dev/null 2>&1; then + err "Invalid ${variable_name} value: ${requested_version}\nValid values:\n${version_list}" >&2 + exit 1 + fi + echo "${variable_name}=${!variable_name}" +} + +# Use semver logic to decrement a version number then look for the closest match +find_prev_version_from_git_tags() { + local variable_name=$1 + local current_version=${!variable_name} + local repository=$2 + # Normally a "v" is used before the version number, but support alternate cases + local prefix=${3:-"tags/v"} + # Some repositories use "_" instead of "." for version number part separation, support that + local separator=${4:-"."} + # Some tools release versions that omit the last digit (e.g. go) + local last_part_optional=${5:-"false"} + # Some repositories may have tags that include a suffix (e.g. actions/node-versions) + local version_suffix_regex=$6 + # Try one break fix version number less if we get a failure. Use "set +e" since "set -e" can cause failures in valid scenarios. + set +e + major="$(echo "${current_version}" | grep -oE '^[0-9]+' || echo '')" + minor="$(echo "${current_version}" | grep -oP '^[0-9]+\.\K[0-9]+' || echo '')" + breakfix="$(echo "${current_version}" | grep -oP '^[0-9]+\.[0-9]+\.\K[0-9]+' 2>/dev/null || echo '')" + + if [ "${minor}" = "0" ] && [ "${breakfix}" = "0" ]; then + ((major=major-1)) + declare -g ${variable_name}="${major}" + # Look for latest version from previous major release + find_version_from_git_tags "${variable_name}" "${repository}" "${prefix}" "${separator}" "${last_part_optional}" + # Handle situations like Go's odd version pattern where "0" releases omit the last part + elif [ "${breakfix}" = "" ] || [ "${breakfix}" = "0" ]; then + ((minor=minor-1)) + declare -g ${variable_name}="${major}.${minor}" + # Look for latest version from previous minor release + find_version_from_git_tags "${variable_name}" "${repository}" "${prefix}" "${separator}" "${last_part_optional}" + else + ((breakfix=breakfix-1)) + if [ "${breakfix}" = "0" ] && [ "${last_part_optional}" = "true" ]; then + declare -g ${variable_name}="${major}.${minor}" + else + declare -g ${variable_name}="${major}.${minor}.${breakfix}" + fi + fi + set -e +} + +# Function to fetch the version released prior to the latest version +get_previous_version() { + local url=$1 + local repo_url=$2 + local variable_name=$3 + prev_version=${!variable_name} + + output=$(curl -s "$repo_url"); + if echo "$output" | jq -e 'type == "object"' > /dev/null; then + message=$(echo "$output" | jq -r '.message') + + if [[ $message == "API rate limit exceeded"* ]]; then + echo -e "\nAn attempt to find latest version using GitHub Api Failed... \nReason: ${message}" + echo -e "\nAttempting to find latest version using GitHub tags." + find_prev_version_from_git_tags prev_version "$url" "tags/v" + declare -g ${variable_name}="${prev_version}" + fi + elif echo "$output" | jq -e 'type == "array"' > /dev/null; then + echo -e "\nAttempting to find latest version using GitHub Api." + version=$(echo "$output" | jq -r '.[1].tag_name') + declare -g ${variable_name}="${version#v}" + fi + echo "${variable_name}=${!variable_name}" +} + +get_github_api_repo_url() { + local url=$1 + echo "${url/https:\/\/github.com/https:\/\/api.github.com\/repos}/releases" +} + +########################################### +# Start docker-in-docker installation +########################################### + +# Ensure apt is in non-interactive to avoid prompts +export DEBIAN_FRONTEND=noninteractive + +# Source /etc/os-release to get OS info +. /etc/os-release + +# Determine adjusted ID and package manager +if [ "${ID}" = "debian" ] || [ "${ID_LIKE}" = "debian" ]; then + ADJUSTED_ID="debian" + PKG_MGR_CMD="apt-get" + # Use dpkg for Debian-based systems + architecture="$(dpkg --print-architecture 2>/dev/null || uname -m)" +elif [[ "${ID}" = "rhel" || "${ID}" = "fedora" || "${ID}" = "azurelinux" || "${ID}" = "mariner" || "${ID_LIKE}" = *"rhel"* || "${ID_LIKE}" = *"fedora"* || "${ID_LIKE}" = *"azurelinux"* || "${ID_LIKE}" = *"mariner"* ]]; then + ADJUSTED_ID="rhel" + # Determine the appropriate package manager for RHEL-based systems + for pkg_mgr in tdnf dnf microdnf yum; do + if command -v "$pkg_mgr" >/dev/null 2>&1; then + PKG_MGR_CMD="$pkg_mgr" + break + fi + done + + if [ -z "${PKG_MGR_CMD}" ]; then + err "Unable to find a supported package manager (tdnf, dnf, microdnf, yum)" + exit 1 + fi + + architecture="$(rpm --eval '%{_arch}' 2>/dev/null || uname -m)" +else + err "Linux distro ${ID} not supported." + exit 1 +fi + +# Azure Linux specific setup +if [ "${ID}" = "azurelinux" ]; then + VERSION_CODENAME="azurelinux${VERSION_ID}" +fi + +# Prevent attempting to install Moby on Debian trixie (packages removed) +if [ "${USE_MOBY}" = "true" ] && [ "${ID}" = "debian" ] && [ "${VERSION_CODENAME}" = "trixie" ]; then + err "The 'moby' option is not supported on Debian 'trixie' because 'moby-cli' and related system packages have been removed from that distribution." + err "To continue, either set the feature option '\"moby\": false' or use a different base image (for example: 'debian:bookworm' or 'ubuntu-24.04')." + exit 1 +fi + +# Check if distro is supported +if [ "${USE_MOBY}" = "true" ]; then + if [ "${ADJUSTED_ID}" = "debian" ]; then + if [[ "${DOCKER_MOBY_ARCHIVE_VERSION_CODENAMES}" != *"${VERSION_CODENAME}"* ]]; then + err "Unsupported distribution version '${VERSION_CODENAME}'. To resolve, either: (1) set feature option '\"moby\": false' , or (2) choose a compatible OS distribution" + err "Supported distributions include: ${DOCKER_MOBY_ARCHIVE_VERSION_CODENAMES}" + exit 1 + fi + echo "(*) ${VERSION_CODENAME} is supported for Moby installation - setting up Microsoft repository" + elif [ "${ADJUSTED_ID}" = "rhel" ]; then + if [ "${ID}" = "azurelinux" ] || [ "${ID}" = "mariner" ]; then + echo " (*) ${ID} ${VERSION_ID} detected - using Microsoft repositories for Moby packages" + else + echo "RHEL-based system (${ID}) detected - Moby packages may require additional configuration" + fi + fi +else + if [ "${ADJUSTED_ID}" = "debian" ]; then + if [[ "${DOCKER_LICENSED_ARCHIVE_VERSION_CODENAMES}" != *"${VERSION_CODENAME}"* ]]; then + err "Unsupported distribution version '${VERSION_CODENAME}'. To resolve, please choose a compatible OS distribution" + err "Supported distributions include: ${DOCKER_LICENSED_ARCHIVE_VERSION_CODENAMES}" + exit 1 + fi + echo "(*) ${VERSION_CODENAME} is supported for Docker CE installation (supported: ${DOCKER_LICENSED_ARCHIVE_VERSION_CODENAMES}) - setting up Docker repository" + elif [ "${ADJUSTED_ID}" = "rhel" ]; then + + echo "RHEL-based system (${ID}) detected - using Docker CE packages" + fi +fi + +# Install base dependencies +base_packages="curl ca-certificates pigz iptables gnupg2 wget jq" +case ${ADJUSTED_ID} in + debian) + check_packages apt-transport-https $base_packages dirmngr + ;; + rhel) + check_packages $base_packages tar gawk shadow-utils policycoreutils procps-ng systemd-libs systemd-devel + + ;; +esac + +# Install git if not already present +if ! command -v git >/dev/null 2>&1; then + check_packages git +fi + +# Update CA certificates to ensure HTTPS connections work properly +# This is especially important for Ubuntu 24.04 (Noble) and Debian Trixie +# Only run for Debian-based systems (RHEL uses update-ca-trust instead) +if [ "${ADJUSTED_ID}" = "debian" ] && command -v update-ca-certificates > /dev/null 2>&1; then + update-ca-certificates +fi + +# Swap to legacy iptables for compatibility (Debian only) +if [ "${ADJUSTED_ID}" = "debian" ] && type iptables-legacy > /dev/null 2>&1; then + update-alternatives --set iptables /usr/sbin/iptables-legacy + update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy +fi + +# Set up the necessary repositories +if [ "${USE_MOBY}" = "true" ]; then + # Name of open source engine/cli + engine_package_name="moby-engine" + cli_package_name="moby-cli" + + case ${ADJUSTED_ID} in + debian) + # Import key safely and import Microsoft apt repo + { + curl -sSL ${MICROSOFT_GPG_KEYS_URI} + curl -sSL ${MICROSOFT_GPG_KEYS_ROLLING_URI} + } | gpg --dearmor > /usr/share/keyrings/microsoft-archive-keyring.gpg + echo "deb [arch=${architecture} signed-by=/usr/share/keyrings/microsoft-archive-keyring.gpg] https://packages.microsoft.com/repos/microsoft-${ID}-${VERSION_CODENAME}-prod ${VERSION_CODENAME} main" > /etc/apt/sources.list.d/microsoft.list + ;; + rhel) + echo "(*) ${ID} detected - checking for Moby packages..." + + # Check if moby packages are available in default repos + if ${PKG_MGR_CMD} list available moby-engine >/dev/null 2>&1; then + echo "(*) Using built-in ${ID} Moby packages" + else + case "${ID}" in + azurelinux) + echo "(*) Moby packages not found in Azure Linux repositories" + echo "(*) For Azure Linux, Docker CE ('moby': false) is recommended" + err "Moby packages are not available for Azure Linux ${VERSION_ID}." + err "Recommendation: Use '\"moby\": false' to install Docker CE instead." + exit 1 + ;; + mariner) + echo "(*) Adding Microsoft repository for CBL-Mariner..." + # Add Microsoft repository if packages aren't available locally + curl -sSL ${MICROSOFT_GPG_KEYS_URI} | gpg --dearmor > /etc/pki/rpm-gpg/microsoft.gpg + cat > /etc/yum.repos.d/microsoft.repo << EOF +[microsoft] +name=Microsoft Repository +baseurl=https://packages.microsoft.com/repos/microsoft-cbl-mariner-2.0-prod-base/ +enabled=1 +gpgcheck=1 +gpgkey=file:///etc/pki/rpm-gpg/microsoft.gpg +EOF + # Verify packages are available after adding repo + pkg_mgr_update + if ! ${PKG_MGR_CMD} list available moby-engine >/dev/null 2>&1; then + echo "(*) Moby packages not found in Microsoft repository either" + err "Moby packages are not available for CBL-Mariner ${VERSION_ID}." + err "Recommendation: Use '\"moby\": false' to install Docker CE instead." + exit 1 + fi + ;; + *) + err "Moby packages are not available for ${ID}. Please use 'moby': false option." + exit 1 + ;; + esac + fi + ;; + esac +else + # Name of licensed engine/cli + engine_package_name="docker-ce" + cli_package_name="docker-ce-cli" + case ${ADJUSTED_ID} in + debian) + curl -fsSL https://download.docker.com/linux/${ID}/gpg | gpg --dearmor > /usr/share/keyrings/docker-archive-keyring.gpg + echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/${ID} ${VERSION_CODENAME} stable" > /etc/apt/sources.list.d/docker.list + ;; + rhel) + # Docker CE repository setup for RHEL-based systems + setup_docker_ce_repo() { + curl -fsSL https://download.docker.com/linux/centos/gpg > /etc/pki/rpm-gpg/docker-ce.gpg + cat > /etc/yum.repos.d/docker-ce.repo << EOF +[docker-ce-stable] +name=Docker CE Stable +baseurl=https://download.docker.com/linux/centos/9/\$basearch/stable +enabled=1 +gpgcheck=1 +gpgkey=file:///etc/pki/rpm-gpg/docker-ce.gpg +skip_if_unavailable=1 +module_hotfixes=1 +EOF + } + install_azure_linux_deps() { + echo "(*) Installing device-mapper libraries for Docker CE..." + [ "${ID}" != "mariner" ] && ${PKG_MGR_CMD} -y install device-mapper-libs 2>/dev/null || echo "(*) Device-mapper install failed, proceeding" + echo "(*) Installing additional Docker CE dependencies..." + ${PKG_MGR_CMD} -y install libseccomp libtool-ltdl systemd-libs libcgroup tar xz || { + echo "(*) Some optional dependencies could not be installed, continuing..." + } + } + setup_selinux_context() { + if command -v getenforce >/dev/null 2>&1 && [ "$(getenforce 2>/dev/null)" != "Disabled" ]; then + echo "(*) Creating minimal SELinux context for Docker compatibility..." + mkdir -p /etc/selinux/targeted/contexts/files/ 2>/dev/null || true + echo "/var/lib/docker(/.*)? system_u:object_r:container_file_t:s0" >> /etc/selinux/targeted/contexts/files/file_contexts.local 2>/dev/null || true + fi + } + + # Special handling for RHEL Docker CE installation + case "${ID}" in + azurelinux|mariner) + echo "(*) ${ID} detected" + echo "(*) Note: Moby packages work better on Azure Linux. Consider using 'moby': true" + echo "(*) Setting up Docker CE repository..." + + setup_docker_ce_repo + install_azure_linux_deps + + if [ "${USE_MOBY}" != "true" ]; then + echo "(*) Docker CE installation for Azure Linux - skipping container-selinux" + echo "(*) Note: SELinux policies will be minimal but Docker will function normally" + setup_selinux_context + else + echo "(*) Using Moby - container-selinux not required" + fi + ;; + *) + # Standard RHEL/CentOS/Fedora approach + if command -v dnf >/dev/null 2>&1; then + dnf config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo + elif command -v yum-config-manager >/dev/null 2>&1; then + yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo + else + # Manual fallback + setup_docker_ce_repo + fi + ;; + esac + ;; + esac +fi + +# Refresh package database +case ${ADJUSTED_ID} in + debian) + apt-get update + ;; + rhel) + pkg_mgr_update + ;; +esac + +# Soft version matching +if [ "${DOCKER_VERSION}" = "latest" ] || [ "${DOCKER_VERSION}" = "lts" ] || [ "${DOCKER_VERSION}" = "stable" ]; then + # Empty, meaning grab whatever "latest" is in apt repo + engine_version_suffix="" + cli_version_suffix="" +else + case ${ADJUSTED_ID} in + debian) + # Fetch a valid version from the apt-cache (eg: the Microsoft repo appends +azure, breakfix, etc...) + docker_version_dot_escaped="${DOCKER_VERSION//./\\.}" + docker_version_dot_plus_escaped="${docker_version_dot_escaped//+/\\+}" + # Regex needs to handle debian package version number format: https://www.systutorials.com/docs/linux/man/5-deb-version/ + docker_version_regex="^(.+:)?${docker_version_dot_plus_escaped}([\\.\\+ ~:-]|$)" + set +e # Don't exit if finding version fails - will handle gracefully + cli_version_suffix="=$(apt-cache madison ${cli_package_name} | awk -F"|" '{print $2}' | sed -e 's/^[ \t]*//' | grep -E -m 1 "${docker_version_regex}")" + engine_version_suffix="=$(apt-cache madison ${engine_package_name} | awk -F"|" '{print $2}' | sed -e 's/^[ \t]*//' | grep -E -m 1 "${docker_version_regex}")" + set -e + if [ -z "${engine_version_suffix}" ] || [ "${engine_version_suffix}" = "=" ] || [ -z "${cli_version_suffix}" ] || [ "${cli_version_suffix}" = "=" ] ; then + err "No full or partial Docker / Moby version match found for \"${DOCKER_VERSION}\" on OS ${ID} ${VERSION_CODENAME} (${architecture}). Available versions:" + apt-cache madison ${cli_package_name} | awk -F"|" '{print $2}' | grep -oP '^(.+:)?\K.+' + exit 1 + fi + ;; +rhel) + # For RHEL-based systems, use dnf/yum to find versions + docker_version_escaped="${DOCKER_VERSION//./\\.}" + set +e # Don't exit if finding version fails - will handle gracefully + if [ "${USE_MOBY}" = "true" ]; then + available_versions=$(${PKG_MGR_CMD} list --available moby-engine 2>/dev/null | grep -v "Available Packages" | awk '{print $2}' | grep -E "^${docker_version_escaped}" | head -1) + else + available_versions=$(${PKG_MGR_CMD} list --available docker-ce 2>/dev/null | grep -v "Available Packages" | awk '{print $2}' | grep -E "^${docker_version_escaped}" | head -1) + fi + set -e + if [ -n "${available_versions}" ]; then + engine_version_suffix="-${available_versions}" + cli_version_suffix="-${available_versions}" + else + echo "(*) Exact version ${DOCKER_VERSION} not found, using latest available" + engine_version_suffix="" + cli_version_suffix="" + fi + ;; + esac +fi + +# Version matching for moby-buildx +if [ "${USE_MOBY}" = "true" ]; then + if [ "${MOBY_BUILDX_VERSION}" = "latest" ]; then + # Empty, meaning grab whatever "latest" is in apt repo + buildx_version_suffix="" + else + case ${ADJUSTED_ID} in + debian) + buildx_version_dot_escaped="${MOBY_BUILDX_VERSION//./\\.}" + buildx_version_dot_plus_escaped="${buildx_version_dot_escaped//+/\\+}" + buildx_version_regex="^(.+:)?${buildx_version_dot_plus_escaped}([\\.\\+ ~:-]|$)" + set +e + buildx_version_suffix="=$(apt-cache madison moby-buildx | awk -F"|" '{print $2}' | sed -e 's/^[ \t]*//' | grep -E -m 1 "${buildx_version_regex}")" + set -e + if [ -z "${buildx_version_suffix}" ] || [ "${buildx_version_suffix}" = "=" ]; then + err "No full or partial moby-buildx version match found for \"${MOBY_BUILDX_VERSION}\" on OS ${ID} ${VERSION_CODENAME} (${architecture}). Available versions:" + apt-cache madison moby-buildx | awk -F"|" '{print $2}' | grep -oP '^(.+:)?\K.+' + exit 1 + fi + ;; + rhel) + # For RHEL-based systems, try to find buildx version or use latest + buildx_version_escaped="${MOBY_BUILDX_VERSION//./\\.}" + set +e + available_buildx=$(${PKG_MGR_CMD} list --available moby-buildx 2>/dev/null | grep -v "Available Packages" | awk '{print $2}' | grep -E "^${buildx_version_escaped}" | head -1) + set -e + if [ -n "${available_buildx}" ]; then + buildx_version_suffix="-${available_buildx}" + else + echo "(*) Exact buildx version ${MOBY_BUILDX_VERSION} not found, using latest available" + buildx_version_suffix="" + fi + ;; + esac + echo "buildx_version_suffix ${buildx_version_suffix}" + fi +fi + +# Install Docker / Moby CLI if not already installed +if type docker > /dev/null 2>&1 && type dockerd > /dev/null 2>&1; then + echo "Docker / Moby CLI and Engine already installed." +else + case ${ADJUSTED_ID} in + debian) + if [ "${USE_MOBY}" = "true" ]; then + # Install engine + set +e # Handle error gracefully + apt-get -y install --no-install-recommends moby-cli${cli_version_suffix} moby-buildx${buildx_version_suffix} moby-engine${engine_version_suffix} + exit_code=$? + set -e + + if [ ${exit_code} -ne 0 ]; then + err "Packages for moby not available in OS ${ID} ${VERSION_CODENAME} (${architecture}). To resolve, either: (1) set feature option '\"moby\": false' , or (2) choose a compatible OS version (eg: 'ubuntu-24.04')." + exit 1 + fi + + # Install compose + apt-get -y install --no-install-recommends moby-compose || err "Package moby-compose (Docker Compose v2) not available for OS ${ID} ${VERSION_CODENAME} (${architecture}). Skipping." + else + apt-get -y install --no-install-recommends docker-ce-cli${cli_version_suffix} docker-ce${engine_version_suffix} + # Install compose + apt-mark hold docker-ce docker-ce-cli + apt-get -y install --no-install-recommends docker-compose-plugin || echo "(*) Package docker-compose-plugin (Docker Compose v2) not available for OS ${ID} ${VERSION_CODENAME} (${architecture}). Skipping." + fi + ;; + rhel) + if [ "${USE_MOBY}" = "true" ]; then + set +e # Handle error gracefully + ${PKG_MGR_CMD} -y install moby-cli${cli_version_suffix} moby-engine${engine_version_suffix} + exit_code=$? + set -e + + if [ ${exit_code} -ne 0 ]; then + err "Packages for moby not available in OS ${ID} ${VERSION_CODENAME} (${architecture}). To resolve, either: (1) set feature option '\"moby\": false' , or (2) choose a compatible OS version." + exit 1 + fi + + # Install compose + if [ "${DOCKER_DASH_COMPOSE_VERSION}" != "none" ]; then + ${PKG_MGR_CMD} -y install moby-compose || echo "(*) Package moby-compose not available for ${ID} ${VERSION_CODENAME} (${architecture}). Skipping." + fi + else + # Special handling for Azure Linux Docker CE installation + if [ "${ID}" = "azurelinux" ] || [ "${ID}" = "mariner" ]; then + echo "(*) Installing Docker CE on Azure Linux (bypassing container-selinux dependency)..." + + # Use rpm with --force and --nodeps for Azure Linux + set +e # Don't exit on error for this section + ${PKG_MGR_CMD} -y install docker-ce${cli_version_suffix} docker-ce-cli${engine_version_suffix} containerd.io + install_result=$? + set -e + + if [ $install_result -ne 0 ]; then + echo "(*) Standard installation failed, trying manual installation..." + + echo "(*) Standard installation failed, trying manual installation..." + + # Create directory for downloading packages + mkdir -p /tmp/docker-ce-install + + # Download packages manually using curl since tdnf doesn't support download + echo "(*) Downloading Docker CE packages manually..." + + # Get the repository baseurl + repo_baseurl="https://download.docker.com/linux/centos/9/x86_64/stable" + + # Download packages directly + cd /tmp/docker-ce-install + + # Get package names with versions + if [ -n "${cli_version_suffix}" ]; then + docker_ce_version="${cli_version_suffix#-}" + docker_cli_version="${engine_version_suffix#-}" + else + # Get latest version from repository + docker_ce_version="latest" + fi + + echo "(*) Attempting to download Docker CE packages from repository..." + + # Try to download latest packages if specific version fails + if ! curl -fsSL "${repo_baseurl}/Packages/docker-ce-${docker_ce_version}.el9.x86_64.rpm" -o docker-ce.rpm 2>/dev/null; then + # Fallback: try to get latest available version + echo "(*) Specific version not found, trying latest..." + latest_docker=$(curl -s "${repo_baseurl}/Packages/" | grep -o 'docker-ce-[0-9][^"]*\.el9\.x86_64\.rpm' | head -1) + latest_cli=$(curl -s "${repo_baseurl}/Packages/" | grep -o 'docker-ce-cli-[0-9][^"]*\.el9\.x86_64\.rpm' | head -1) + latest_containerd=$(curl -s "${repo_baseurl}/Packages/" | grep -o 'containerd\.io-[0-9][^"]*\.el9\.x86_64\.rpm' | head -1) + + if [ -n "${latest_docker}" ]; then + curl -fsSL "${repo_baseurl}/Packages/${latest_docker}" -o docker-ce.rpm + curl -fsSL "${repo_baseurl}/Packages/${latest_cli}" -o docker-ce-cli.rpm + curl -fsSL "${repo_baseurl}/Packages/${latest_containerd}" -o containerd.io.rpm + else + echo "(*) ERROR: Could not find Docker CE packages in repository" + echo "(*) Please check repository configuration or use 'moby': true" + exit 1 + fi + fi + # Install systemd libraries required by Docker CE + echo "(*) Installing systemd libraries required by Docker CE..." + ${PKG_MGR_CMD} -y install systemd-libs || ${PKG_MGR_CMD} -y install systemd-devel || { + echo "(*) WARNING: Could not install systemd libraries" + echo "(*) Docker may fail to start without these" + } + + # Install with rpm --force --nodeps + echo "(*) Installing Docker CE packages with dependency override..." + rpm -Uvh --force --nodeps *.rpm + + # Cleanup + cd / + rm -rf /tmp/docker-ce-install + + echo "(*) Docker CE installation completed with dependency bypass" + echo "(*) Note: Some SELinux functionality may be limited without container-selinux" + fi + else + # Standard installation for other RHEL-based systems + ${PKG_MGR_CMD} -y install docker-ce${cli_version_suffix} docker-ce-cli${engine_version_suffix} containerd.io + fi + # Install compose + if [ "${DOCKER_DASH_COMPOSE_VERSION}" != "none" ]; then + ${PKG_MGR_CMD} -y install docker-compose-plugin || echo "(*) Package docker-compose-plugin not available for ${ID} ${VERSION_CODENAME} (${architecture}). Skipping." + fi + fi + ;; + esac +fi + +echo "Finished installing docker / moby!" + +docker_home="/usr/libexec/docker" +cli_plugins_dir="${docker_home}/cli-plugins" + +# fallback for docker-compose +fallback_compose(){ + local url=$1 + local repo_url=$(get_github_api_repo_url "$url") + echo -e "\n(!) Failed to fetch the latest artifacts for docker-compose v${compose_version}..." + get_previous_version "${url}" "${repo_url}" compose_version + echo -e "\nAttempting to install v${compose_version}" + curl -fsSL "https://github.com/docker/compose/releases/download/v${compose_version}/docker-compose-linux-${target_compose_arch}" -o ${docker_compose_path} +} + +# If 'docker-compose' command is to be included +if [ "${DOCKER_DASH_COMPOSE_VERSION}" != "none" ]; then + case "${architecture}" in + amd64|x86_64) target_compose_arch=x86_64 ;; + arm64|aarch64) target_compose_arch=aarch64 ;; + *) + echo "(!) Docker in docker does not support machine architecture '$architecture'. Please use an x86-64 or ARM64 machine." + exit 1 + esac + + docker_compose_path="/usr/local/bin/docker-compose" + if [ "${DOCKER_DASH_COMPOSE_VERSION}" = "v1" ]; then + err "The final Compose V1 release, version 1.29.2, was May 10, 2021. These packages haven't received any security updates since then. Use at your own risk." + INSTALL_DOCKER_COMPOSE_SWITCH="false" + + if [ "${target_compose_arch}" = "x86_64" ]; then + echo "(*) Installing docker compose v1..." + curl -fsSL "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-Linux-x86_64" -o ${docker_compose_path} + chmod +x ${docker_compose_path} + + # Download the SHA256 checksum + DOCKER_COMPOSE_SHA256="$(curl -sSL "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-Linux-x86_64.sha256" | awk '{print $1}')" + echo "${DOCKER_COMPOSE_SHA256} ${docker_compose_path}" > docker-compose.sha256sum + sha256sum -c docker-compose.sha256sum --ignore-missing + elif [ "${VERSION_CODENAME}" = "bookworm" ]; then + err "Docker compose v1 is unavailable for 'bookworm' on Arm64. Kindly switch to use v2" + exit 1 + else + # Use pip to get a version that runs on this architecture + check_packages python3-minimal python3-pip libffi-dev python3-venv + echo "(*) Installing docker compose v1 via pip..." + export PYTHONUSERBASE=/usr/local + pip3 install --disable-pip-version-check --no-cache-dir --user "Cython<3.0" pyyaml wheel docker-compose --no-build-isolation + fi + else + compose_version=${DOCKER_DASH_COMPOSE_VERSION#v} + docker_compose_url="https://github.com/docker/compose" + find_version_from_git_tags compose_version "$docker_compose_url" "tags/v" + echo "(*) Installing docker-compose ${compose_version}..." + curl -fsSL "https://github.com/docker/compose/releases/download/v${compose_version}/docker-compose-linux-${target_compose_arch}" -o ${docker_compose_path} || { + echo -e "\n(!) Failed to fetch the latest artifacts for docker-compose v${compose_version}..." + fallback_compose "$docker_compose_url" + } + + chmod +x ${docker_compose_path} + + # Download the SHA256 checksum + DOCKER_COMPOSE_SHA256="$(curl -sSL "https://github.com/docker/compose/releases/download/v${compose_version}/docker-compose-linux-${target_compose_arch}.sha256" | awk '{print $1}')" + echo "${DOCKER_COMPOSE_SHA256} ${docker_compose_path}" > docker-compose.sha256sum + sha256sum -c docker-compose.sha256sum --ignore-missing + + mkdir -p ${cli_plugins_dir} + cp ${docker_compose_path} ${cli_plugins_dir} + fi +fi + +# fallback method for compose-switch +fallback_compose-switch() { + local url=$1 + local repo_url=$(get_github_api_repo_url "$url") + echo -e "\n(!) Failed to fetch the latest artifacts for compose-switch v${compose_switch_version}..." + get_previous_version "$url" "$repo_url" compose_switch_version + echo -e "\nAttempting to install v${compose_switch_version}" + curl -fsSL "https://github.com/docker/compose-switch/releases/download/v${compose_switch_version}/docker-compose-linux-${target_switch_arch}" -o /usr/local/bin/compose-switch +} +# Install docker-compose switch if not already installed - https://github.com/docker/compose-switch#manual-installation +if [ "${INSTALL_DOCKER_COMPOSE_SWITCH}" = "true" ] && ! type compose-switch > /dev/null 2>&1; then + if type docker-compose > /dev/null 2>&1; then + echo "(*) Installing compose-switch..." + current_compose_path="$(command -v docker-compose)" + target_compose_path="$(dirname "${current_compose_path}")/docker-compose-v1" + compose_switch_version="latest" + compose_switch_url="https://github.com/docker/compose-switch" + # Try to get latest version, fallback to known stable version if GitHub API fails + set +e + find_version_from_git_tags compose_switch_version "$compose_switch_url" + if [ $? -ne 0 ] || [ -z "${compose_switch_version}" ] || [ "${compose_switch_version}" = "latest" ]; then + echo "(*) GitHub API rate limited or failed, using fallback method" + fallback_compose-switch "$compose_switch_url" + fi + set -e + + # Map architecture for compose-switch downloads + case "${architecture}" in + amd64|x86_64) target_switch_arch=amd64 ;; + arm64|aarch64) target_switch_arch=arm64 ;; + *) target_switch_arch=${architecture} ;; + esac + curl -fsSL "https://github.com/docker/compose-switch/releases/download/v${compose_switch_version}/docker-compose-linux-${target_switch_arch}" -o /usr/local/bin/compose-switch || fallback_compose-switch "$compose_switch_url" + chmod +x /usr/local/bin/compose-switch + # TODO: Verify checksum once available: https://github.com/docker/compose-switch/issues/11 + # Setup v1 CLI as alternative in addition to compose-switch (which maps to v2) + mv "${current_compose_path}" "${target_compose_path}" + update-alternatives --install ${docker_compose_path} docker-compose /usr/local/bin/compose-switch 99 + update-alternatives --install ${docker_compose_path} docker-compose "${target_compose_path}" 1 + else + err "Skipping installation of compose-switch as docker compose is unavailable..." + fi +fi + +# If init file already exists, exit +if [ -f "/usr/local/share/docker-init.sh" ]; then + echo "/usr/local/share/docker-init.sh already exists, so exiting." + # Clean up + rm -rf /var/lib/apt/lists/* + exit 0 +fi +echo "docker-init doesn't exist, adding..." + +if ! cat /etc/group | grep -e "^docker:" > /dev/null 2>&1; then + groupadd -r docker +fi + +usermod -aG docker ${USERNAME} + +# fallback for docker/buildx +fallback_buildx() { + local url=$1 + local repo_url=$(get_github_api_repo_url "$url") + echo -e "\n(!) Failed to fetch the latest artifacts for docker buildx v${buildx_version}..." + get_previous_version "$url" "$repo_url" buildx_version + buildx_file_name="buildx-v${buildx_version}.linux-${target_buildx_arch}" + echo -e "\nAttempting to install v${buildx_version}" + wget https://github.com/docker/buildx/releases/download/v${buildx_version}/${buildx_file_name} +} + +if [ "${INSTALL_DOCKER_BUILDX}" = "true" ]; then + buildx_version="latest" + docker_buildx_url="https://github.com/docker/buildx" + find_version_from_git_tags buildx_version "$docker_buildx_url" "refs/tags/v" + echo "(*) Installing buildx ${buildx_version}..." + + # Map architecture for buildx downloads + case "${architecture}" in + amd64|x86_64) target_buildx_arch=amd64 ;; + arm64|aarch64) target_buildx_arch=arm64 ;; + *) target_buildx_arch=${architecture} ;; + esac + + buildx_file_name="buildx-v${buildx_version}.linux-${target_buildx_arch}" + + cd /tmp + wget https://github.com/docker/buildx/releases/download/v${buildx_version}/${buildx_file_name} || fallback_buildx "$docker_buildx_url" + + docker_home="/usr/libexec/docker" + cli_plugins_dir="${docker_home}/cli-plugins" + + mkdir -p ${cli_plugins_dir} + mv ${buildx_file_name} ${cli_plugins_dir}/docker-buildx + chmod +x ${cli_plugins_dir}/docker-buildx + + chown -R "${USERNAME}:docker" "${docker_home}" + chmod -R g+r+w "${docker_home}" + find "${docker_home}" -type d -print0 | xargs -n 1 -0 chmod g+s +fi + +DOCKER_DEFAULT_IP6_TABLES="" +if [ "$DISABLE_IP6_TABLES" == true ]; then + requested_version="" + # checking whether the version requested either is in semver format or just a number denoting the major version + # and, extracting the major version number out of the two scenarios + semver_regex="^(0|[1-9][0-9]*)\.(0|[1-9][0-9]*)\.(0|[1-9][0-9]*)(-([0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*))?(\+([0-9A-Za-z-]+(\.[0-9A-Za-z-]+)*))?$" + if echo "$DOCKER_VERSION" | grep -Eq $semver_regex; then + requested_version=$(echo $DOCKER_VERSION | cut -d. -f1) + elif echo "$DOCKER_VERSION" | grep -Eq "^[1-9][0-9]*$"; then + requested_version=$DOCKER_VERSION + fi + if [ "$DOCKER_VERSION" = "latest" ] || [[ -n "$requested_version" && "$requested_version" -ge 27 ]] ; then + DOCKER_DEFAULT_IP6_TABLES="--ip6tables=false" + echo "(!) As requested, passing '${DOCKER_DEFAULT_IP6_TABLES}'" + fi +fi + +if [ ! -d /usr/local/share ]; then + mkdir -p /usr/local/share +fi + +tee /usr/local/share/docker-init.sh > /dev/null \ +<< EOF +#!/bin/sh +#------------------------------------------------------------------------------------------------------------- +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. See https://go.microsoft.com/fwlink/?linkid=2090316 for license information. +#------------------------------------------------------------------------------------------------------------- + +set -e + +AZURE_DNS_AUTO_DETECTION=${AZURE_DNS_AUTO_DETECTION} +DOCKER_DEFAULT_ADDRESS_POOL=${DOCKER_DEFAULT_ADDRESS_POOL} +DOCKER_DEFAULT_IP6_TABLES=${DOCKER_DEFAULT_IP6_TABLES} +EOF + +tee -a /usr/local/share/docker-init.sh > /dev/null \ +<< 'EOF' +dockerd_start="AZURE_DNS_AUTO_DETECTION=${AZURE_DNS_AUTO_DETECTION} DOCKER_DEFAULT_ADDRESS_POOL=${DOCKER_DEFAULT_ADDRESS_POOL} DOCKER_DEFAULT_IP6_TABLES=${DOCKER_DEFAULT_IP6_TABLES} $(cat << 'INNEREOF' + # explicitly remove dockerd and containerd PID file to ensure that it can start properly if it was stopped uncleanly + find /run /var/run -iname 'docker*.pid' -delete || : + find /run /var/run -iname 'container*.pid' -delete || : + + # -- Start: dind wrapper script -- + # Maintained: https://github.com/moby/moby/blob/master/hack/dind + + export container=docker + + if [ -d /sys/kernel/security ] && ! mountpoint -q /sys/kernel/security; then + mount -t securityfs none /sys/kernel/security || { + echo >&2 'Could not mount /sys/kernel/security.' + echo >&2 'AppArmor detection and --privileged mode might break.' + } + fi + + # Mount /tmp (conditionally) + if ! mountpoint -q /tmp; then + mount -t tmpfs none /tmp + fi + + set_cgroup_nesting() + { + # cgroup v2: enable nesting + if [ -f /sys/fs/cgroup/cgroup.controllers ]; then + # move the processes from the root group to the /init group, + # otherwise writing subtree_control fails with EBUSY. + # An error during moving non-existent process (i.e., "cat") is ignored. + mkdir -p /sys/fs/cgroup/init + xargs -rn1 < /sys/fs/cgroup/cgroup.procs > /sys/fs/cgroup/init/cgroup.procs || : + # enable controllers + sed -e 's/ / +/g' -e 's/^/+/' < /sys/fs/cgroup/cgroup.controllers \ + > /sys/fs/cgroup/cgroup.subtree_control + fi + } + + # Set cgroup nesting, retrying if necessary + retry_cgroup_nesting=0 + + until [ "${retry_cgroup_nesting}" -eq "5" ]; + do + set +e + set_cgroup_nesting + + if [ $? -ne 0 ]; then + echo "(*) cgroup v2: Failed to enable nesting, retrying..." + else + break + fi + + retry_cgroup_nesting=`expr $retry_cgroup_nesting + 1` + set -e + done + + # -- End: dind wrapper script -- + + # Handle DNS + set +e + cat /etc/resolv.conf | grep -i 'internal.cloudapp.net' > /dev/null 2>&1 + if [ $? -eq 0 ] && [ "${AZURE_DNS_AUTO_DETECTION}" = "true" ] + then + echo "Setting dockerd Azure DNS." + CUSTOMDNS="--dns 168.63.129.16" + else + echo "Not setting dockerd DNS manually." + CUSTOMDNS="" + fi + set -e + + if [ -z "$DOCKER_DEFAULT_ADDRESS_POOL" ] + then + DEFAULT_ADDRESS_POOL="" + else + DEFAULT_ADDRESS_POOL="--default-address-pool $DOCKER_DEFAULT_ADDRESS_POOL" + fi + + # Start docker/moby engine + ( dockerd $CUSTOMDNS $DEFAULT_ADDRESS_POOL $DOCKER_DEFAULT_IP6_TABLES > /tmp/dockerd.log 2>&1 ) & +INNEREOF +)" + +sudo_if() { + COMMAND="$*" + + if [ "$(id -u)" -ne 0 ]; then + sudo $COMMAND + else + $COMMAND + fi +} + +retry_docker_start_count=0 +docker_ok="false" + +until [ "${docker_ok}" = "true" ] || [ "${retry_docker_start_count}" -eq "5" ]; +do + # Start using sudo if not invoked as root + if [ "$(id -u)" -ne 0 ]; then + sudo /bin/sh -c "${dockerd_start}" + else + eval "${dockerd_start}" + fi + + retry_count=0 + until [ "${docker_ok}" = "true" ] || [ "${retry_count}" -eq "5" ]; + do + sleep 1s + set +e + docker info > /dev/null 2>&1 && docker_ok="true" + set -e + + retry_count=`expr $retry_count + 1` + done + + if [ "${docker_ok}" != "true" ] && [ "${retry_docker_start_count}" != "4" ]; then + echo "(*) Failed to start docker, retrying..." + set +e + sudo_if pkill dockerd + sudo_if pkill containerd + set -e + fi + + retry_docker_start_count=`expr $retry_docker_start_count + 1` +done + +# Execute whatever commands were passed in (if any). This allows us +# to set this script to ENTRYPOINT while still executing the default CMD. +exec "$@" +EOF + +chmod +x /usr/local/share/docker-init.sh +chown ${USERNAME}:root /usr/local/share/docker-init.sh + +# Clean up +rm -rf /var/lib/apt/lists/* + +echo 'docker-in-docker-debian script has completed!' diff --git a/.devcontainer/start-dind.sh b/.devcontainer/start-dind.sh new file mode 100755 index 0000000000..17a1b64ba2 --- /dev/null +++ b/.devcontainer/start-dind.sh @@ -0,0 +1,29 @@ +#!/bin/bash +# Start Docker-in-Docker with cgroup v1 nesting fix. +# Usage: sudo bash start-dind.sh +set -e + +# 1. Load iptables kernel modules (required by dockerd networking) +modprobe ip_tables iptable_nat iptable_filter 2>/dev/null || true + +# 2. Fix cgroup v1 nesting for DinD +# In a privileged container on cgroup v1, each subsystem shows the full +# host hierarchy. Inner Docker's runc expects the container's own cgroup +# as root. We bind-mount each subsystem to the container's own cgroup dir. +if [ ! -f /sys/fs/cgroup/cgroup.controllers ]; then + SELF_CGROUP_ID=$(grep ':memory:' /proc/1/cgroup | cut -d: -f3 | sed 's|^/docker/||') + if [ -n "$SELF_CGROUP_ID" ]; then + for subsys_dir in /sys/fs/cgroup/*/; do + subsys_name=$(basename "$subsys_dir") + [ -L "/sys/fs/cgroup/$subsys_name" ] && continue + our_dir="$subsys_dir/docker/$SELF_CGROUP_ID" + if [ -d "$our_dir" ]; then + mount --bind "$our_dir" "$subsys_dir" 2>/dev/null || true + fi + done + fi +fi + +# 3. Start Docker daemon via the DinD init script +/usr/local/share/docker-init.sh +sleep 2 diff --git a/.gitignore b/.gitignore index 1392b48ba4..7df4b01410 100644 --- a/.gitignore +++ b/.gitignore @@ -59,9 +59,7 @@ _deps # Custom /build/ core/build/ -core/protobuf/config_server/*/*.pb.* -core/protobuf/*/*.pb.* -core/log_pb/*.pb.* +*.pb.* core/common/Version.cpp !/Makefile # Enterprise @@ -90,9 +88,11 @@ plugins/all/ *.go.mod.sum # Custom plugin_logger.xml +go_plugin.LOG ### E2E /*-test/ +testcase-compose.yaml ### License find_licenses/ @@ -106,7 +106,10 @@ license_coverage.txt /dist/ /tags/ -### Cursor +### IDE configs /.cursor/ +/.claude/settings.local.json /.claude/ /.gemini/ +.omc/ +/code-review/ diff --git a/docker/Dockerfile_development_part b/docker/Dockerfile_development_part index 06cb0678fd..8383263a56 100644 --- a/docker/Dockerfile_development_part +++ b/docker/Dockerfile_development_part @@ -18,30 +18,36 @@ ARG HOST_OS=Linux ARG VERSION=0.0.1 USER root -WORKDIR /loongcollector +ENV container=docker -RUN mkdir -p /loongcollector/conf/instance_config/local -RUN mkdir -p /loongcollector/log -RUN mkdir -p /loongcollector/data -RUN mkdir -p /loongcollector/run +RUN yum update -y && yum -y install systemd initscripts && yum -y clean all && rm -fr /var/cache -COPY --from=build /src/core/build/loongcollector /loongcollector/ +RUN mkdir -p /usr/local/loongcollector/conf/instance_config/local +RUN mkdir -p /usr/local/loongcollector/log +RUN mkdir -p /usr/local/loongcollector/data +RUN mkdir -p /usr/local/loongcollector/run + +COPY --from=build /src/core/build/loongcollector /usr/local/loongcollector/ +COPY ./scripts/loongcollector_control.sh /usr/local/loongcollector/ COPY ./scripts/download_ebpflib.sh /tmp/ -RUN chown -R $(whoami) /loongcollector && \ - chmod 755 /loongcollector/loongcollector && \ - mkdir /loongcollector/data/checkpoint && \ - if [ `uname -m` = "x86_64" ]; then /tmp/download_ebpflib.sh /loongcollector; fi && \ +RUN chown -R $(whoami) /usr/local/loongcollector && \ + chmod 755 /usr/local/loongcollector/loongcollector && \ + mkdir -p /usr/local/loongcollector/data/checkpoint && \ + if [ `uname -m` = "x86_64" ]; then /tmp/download_ebpflib.sh /usr/local/loongcollector; fi && \ rm /tmp/download_ebpflib.sh -COPY --from=build /src/output/libGoPluginBase.so /loongcollector/ -COPY --from=build /src/example_config/quick_start/loongcollector_config.json /loongcollector/conf/instance_config/local/loongcollector_config.json -COPY --from=build /src/core/build/go_pipeline/libGoPluginAdapter.so /loongcollector/ -COPY --from=build /src/core/build/ebpf/driver/libeBPFDriver.so /loongcollector/ +COPY --from=build /src/output/libGoPluginBase.so /usr/local/loongcollector/ +COPY --from=build /src/example_config/quick_start/loongcollector_config.json /usr/local/loongcollector/conf/instance_config/local/loongcollector_config.json +COPY --from=build /src/core/build/go_pipeline/libGoPluginAdapter.so /usr/local/loongcollector/ +COPY --from=build /src/core/build/ebpf/driver/libeBPFDriver.so /usr/local/loongcollector/ -ENV HOST_OS=$HOST_OS -ENV LOGTAIL_VERSION=$VERSION +ENV HOST_OS=$HOST_OS \ + LOONGCOLLECTOR_VERSION=$VERSION \ + HTTP_PROBE_PORT=7953 \ + ALIYUN_LOGTAIL_USER_DEFINED_ID=default \ + docker_file_cache_path=checkpoint/docker_path_config.json EXPOSE 18689 -ENTRYPOINT ["/loongcollector/loongcollector"] +CMD ["/usr/local/loongcollector/loongcollector_control.sh", "start_and_block"]