feat: Integrate coolbpf cpu profiling feature in loongcollector#2391
feat: Integrate coolbpf cpu profiling feature in loongcollector#2391wokron wants to merge 90 commits intoalibaba:mainfrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR integrates coolbpf CPU profiling capabilities into loongcollector by adding a new input_cpu_profiling plugin. The implementation enables continuous CPU profiling of specified processes through command-line pattern matching and container discovery.
Key changes:
- New CPU profiling plugin with process discovery mechanism
- Integration with coolbpf profiler library
- Plugin registration and lifecycle management
Reviewed Changes
Copilot reviewed 39 out of 39 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| core/plugin/input/InputCpuProfiling.{h,cpp} | New plugin implementation for CPU profiling input |
| core/ebpf/plugin/cpu_profiling/* | Core CPU profiling manager and process discovery logic |
| core/ebpf/driver/CpuProfiler.h | Wrapper for coolbpf profiler library integration |
| core/ebpf/Config.{h,cpp} | CPU profiling configuration option handling |
| core/ebpf/include/export.h | Type definitions for CPU profiling |
| core/unittest/input/InputCpuProfilingUnittest.cpp | Unit tests for the input plugin |
| core/unittest/ebpf/*Unittest.cpp | Unit tests for CPU profiling components |
| Various CMakeLists.txt | Build system updates for new components |
Comments suppressed due to low confidence (1)
core/unittest/input/InputCpuProfilingUnittest.cpp:1
- Corrected spelling of 'CommandLines' to 'CommandLines' in comment context.
| static void handler_without_ctx(uint32_t pid, const char* comm, const char* stack, uint32_t cnt) { | ||
| mHandler(pid, comm, stack, cnt, mCtx); | ||
| } |
There was a problem hiding this comment.
Static members mHandler and mCtx accessed without synchronization in handler_without_ctx, which is called from Poll() while holding a lock, but could race with Start() and Stop() that modify these members. The callback could be invoked with stale or null pointer values.
| mEBPFAdapter->UpdatePlugin(PluginType::CPU_PROFILING, | ||
| buildCpuProfilingConfig(std::move(totalPids), std::nullopt, nullptr, nullptr)); |
There was a problem hiding this comment.
Passing null handler and context to buildCpuProfilingConfig during update will overwrite the valid handler set during initialization, breaking profiling event handling.
| int maxRetry = 5; | ||
| for (int retry = 0; retry < maxRetry; ++retry) { | ||
| if (QueueStatus::OK == ProcessQueueManager::GetInstance()->PushQueue(info.mQueueKey, std::move(item))) { | ||
| break; | ||
| } | ||
| std::this_thread::sleep_for(std::chrono::milliseconds(100)); | ||
| if (retry == maxRetry - 1) { |
There was a problem hiding this comment.
[nitpick] Magic numbers 5 and 100 for retry attempts and sleep duration should be extracted as named constants or made configurable to improve maintainability and allow tuning.
| int maxRetry = 5; | |
| for (int retry = 0; retry < maxRetry; ++retry) { | |
| if (QueueStatus::OK == ProcessQueueManager::GetInstance()->PushQueue(info.mQueueKey, std::move(item))) { | |
| break; | |
| } | |
| std::this_thread::sleep_for(std::chrono::milliseconds(100)); | |
| if (retry == maxRetry - 1) { | |
| for (int retry = 0; retry < kMaxQueuePushRetry; ++retry) { | |
| if (QueueStatus::OK == ProcessQueueManager::GetInstance()->PushQueue(info.mQueueKey, std::move(item))) { | |
| break; | |
| } | |
| std::this_thread::sleep_for(std::chrono::milliseconds(kQueuePushRetrySleepMs)); | |
| if (retry == kMaxQueuePushRetry - 1) { |
| std::lock_guard guard(mMutex); | ||
| mRouter.clear(); | ||
| for (auto& [configKey, pids] : result) { | ||
| for (auto& pid : pids) { | ||
| totalPids.insert(pid); | ||
| auto it = mRouter.emplace(pid, std::unordered_set<ConfigKey>{}).first; | ||
| auto& configSet = it->second; | ||
| configSet.insert(configKey); | ||
| } | ||
| } |
There was a problem hiding this comment.
Clearing mRouter in HandleProcessDiscoveryEvent creates a race condition with HandleCpuProfilingEvent which reads from mRouter. Events arriving between clear and rebuild could be lost or routed incorrectly.
| // TODO: make this non-static | ||
| inline static livetrace_profiler_read_cb_ctx_t mHandler = nullptr; | ||
| inline static void* mCtx = nullptr; |
There was a problem hiding this comment.
Static member variables for instance-specific handler and context violate encapsulation and prevent multiple CpuProfiler instances from working correctly. This TODO should be addressed before production use.
ecf5b83 to
a3a8e56
Compare
c00d080 to
0f58800
Compare
2a5948d to
6c51221
Compare
|
使用文档仿照其他插件补充一下 |
| std::lock_guard<std::mutex> lock(mMutex); | ||
| if (mProfiler == nullptr) { | ||
| livetrace_enable_tracing(); | ||
| mProfiler = livetrace_profiler_create(); |
There was a problem hiding this comment.
有没有控制队列大小等资源相关的参数,如何约束其资源使用量呢
There was a problem hiding this comment.
队列大小的控制是 coolbpf 侧提供的。其中根据 profile 周期确定了有界队列的大小。https://gitee.com/anolis/coolbpf/blob/master/src/profiler/src/probes/probes.rs#L274
8f81b95 to
1ffef40
Compare
37e01d9 to
8191f24
Compare
文档已经补充了 |
|
百炼自动化审查:建议保持开启。 本 PR 为 LoongCollector 新增基于 coolbpf 的 CPU 性能剖析插件(input_cpu_profiling),包含核心 C++ 实现、eBPF 驱动适配、单元测试与中文文档。维护者已进行多轮代码审查,作者持续响应并修复问题。目前本 PR 因落后于 main 分支存在合并冲突,但功能实现完整且具有明确产品价值,属于活跃开发中的有效特性 PR。. 最佳落地路径: 建议作者 rebase 到最新 main 分支解决合并冲突,并继续跟进维护者审查意见。待 CI 通过且审查批准后由维护者合并至 main 分支。. 已核对内容:
百炼审查备注:模型 qwen3.6-max-preview;对照提交 7099f790b8a3。 |
No description provided.