pd nixl upgrade write mode to transfer kv#1324
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces GPU timing measurements for memory page copy operations and logs the elapsed time. It also adds latency tracking for adding remote agents in the NIXL KV transporter. The feedback suggests defensively checking that both copy_start_event and copy_end_event are not None before calculating the elapsed time to prevent potential AttributeError exceptions.
| if copy_end_event is not None: | ||
| copy_end_event.synchronize() | ||
| read_page_gpu_time_ms = copy_start_event.elapsed_time(copy_end_event) |
There was a problem hiding this comment.
To ensure robust defensive programming, verify that both copy_end_event and copy_start_event are not None before calling elapsed_time to prevent potential AttributeError exceptions.
| if copy_end_event is not None: | |
| copy_end_event.synchronize() | |
| read_page_gpu_time_ms = copy_start_event.elapsed_time(copy_end_event) | |
| if copy_end_event is not None and copy_start_event is not None: | |
| copy_end_event.synchronize() | |
| read_page_gpu_time_ms = copy_start_event.elapsed_time(copy_end_event) |
|
多跑了几次,取 3 次完整 GSM8K 的平均值后,结论仍然是:NIXL PD 吞吐稳定高于 NCCL PD,平均高约 5.29%。 结果: NIXL PD: NCCL PD: NIXL - NCCL = +1.08 req/s |
No description provided.