issue/401 refactor(rope): add scaling factory and TP-safe RoPE cache by rubik-hua · Pull Request #402 · InfiniTensor/InfiniLM

rubik-hua · 2026-05-28T09:36:06Z

Decouple scaling config instantiation from ModelConfig via factory and registry pattern.
Add thread-local RoPE cache with device-scoped keys to reduce VRAM usage and ensure TP safety.
Centralize rotary dimension calculation into ModelConfig.

ModelConfig 扩展（纯粹的数据承载）
ModelConfig 不再掺杂任何具体模型的业务判断，仅提供默认值和读写接口。RoPE::Algo 的差异由具体的模型构建入口（如 csrc/models/chatglm/chatglm_for_causal_lm.cpp）显式指定：
// model_config.hpp
class ModelConfig {
private:
infinicore::nn::RoPE::Algo rope_algo_ = infinicore::nn::RoPE::Algo::GPT_NEOX; // 默认值
public:
infinicore::nn::RoPE::Algo get_rope_algo() const { return rope_algo_; }

};
// csrc/models/chatglm/chatglm_for_causal_lm.cpp
std::shared_ptr create_chatglm_config(const json& hf_config) {
auto config = std::make_shared(hf_config);
// 只有 ChatGLM/GLM4 需要 GPT_J，在此处显式注入，不污染基类
config->set_rope_algo(infinicore::nn::RoPE::Algo::GPT_J);
return config;
}
工厂与注册表机制（字符串路由分发）
引入注册表模式，将 JSON 中的字符串（如 "longrope"）映射到具体的对象构造逻辑，替代冗长的 if-else。
// rotary_embedding_factory.hpp
using ScalingCreator = std::function<std::shared_ptrinfinicore::nn::ScalingConfig(
const std::shared_ptrinfinilm::config::ModelConfig&)>;
std::unordered_map<std::string, ScalingCreator>& get_scaling_registry();
std::shared_ptrinfinicore::nn::RoPE make_rope(/* ... */);
工厂核心实现极简，仅负责组装与路由，不因为新增类型而修改：
// rotary_embedding_factory.cpp
std::shared_ptrinfinicore::nn::ScalingConfig
make_scaling_config(const std::shared_ptrinfinilm::config::ModelConfig& model_config) {
std::string scaling_type = model_config->get_orstd::string("rope_scaling_type", "default");

// 分发点：注册表路由，将字符串映射到具体的 Creator 函数
auto& registry = get_scaling_registry();
auto it = registry.find(scaling_type);
if (it != registry.end()) {
    return it->second(model_config); 
}
throw std::runtime_error("Unsupported rope_scaling_type: " + scaling_type);

}

需要与InfiniCore的下面PR一起合入，
InfiniTensor/InfiniCore#1181

重构后，新增的rope实现都集中在csrc/layers/rotary_embedding/rope_scaling_creators.cpp增加，其它地方无需修改，跟rotary_embedding.cpp和model_config.cpp解耦掉。

之前@pengcheng888 给的建议是把algo参数收编进model_config中，然后在xx_for_causal_lm.cpp 中写入，我实现了一版，但感觉特别别扭，我理解model_config还是纯粹一点好，能从json中读出来或者加工出来的。后来，我又改动了一下，还是直接放到运行时传参更加优雅吧。

重构后所有现有支持的模型已经跑通

rubik-hua · 2026-05-29T05:43:55Z

@wooway777 @pengcheng888 rope重构可以帮忙检视起来了，infinicore上也有一个pr

wooway777 · 2026-05-29T06:38:57Z

@wooway777 @pengcheng888 rope重构可以帮忙检视起来了，infinicore上也有一个pr

谢谢老师，在看了

pengcheng888

请华老师再评估下，infinicore和infinilm这样改动后，如果后续添加其他类型的rope，能够hold住

pengcheng888 · 2026-06-01T07:14:28Z

algo参数收编进model_config中，然后在xx_for_causal_lm.cpp 中写入。

这个想法的初衷是为了删除glm4_attention.hpp/cpp文件； narrow移动到Infinicore后，使用using Glm4Attention = infinilm::layers::attention::Attention;即可。

vllm中也是类似的，在RoPE模块中实现narrow：如下
query_rot = query[..., :rotary_dim]

https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/glm4.py中 Glm4Attention的forward中：q, k = self.rotary_emb(positions, q, k)。
narrow操作不会出现在Glm4Attention的层级中。

这样后续维护的话，维护一个layers::attention::Attention就可。（ps: 后续我们确是会对Attention类进行小调整）

这个改动是否修改，需要再商议，不在这个pr修改。

rubik-hua · 2026-06-01T11:58:25Z

algo参数收编进model_config中，然后在xx_for_causal_lm.cpp 中写入。

这个想法的初衷是为了删除glm4_attention.hpp/cpp文件； narrow移动到Infinicore后，使用using Glm4Attention = infinilm::layers::attention::Attention;即可。

vllm中也是类似的，在RoPE模块中实现narrow：如下 query_rot = query[..., :rotary_dim]

https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/glm4.py中 Glm4Attention的forward中：q, k = self.rotary_emb(positions, q, k)。 narrow操作不会出现在Glm4Attention的层级中。

这样后续维护的话，维护一个layers::attention::Attention就可。（ps: 后续我们确是会对Attention类进行小调整）

这个改动是否修改，需要再商议，不在这个pr修改。

对的，这样我就理解了，已经在model_config中提供了对algo的get和set方法，默认infinicore::nn::RoPE::Algo::GPT_NEOX。

所有检视意见修改完后我又重跑了一遍examples/test_infer.py，没问题。

- Decouple scaling config instantiation from ModelConfig via factory and registry pattern. - Add thread-local RoPE cache with device-scoped keys to reduce VRAM usage and ensure TP safety. - Centralize rotary dimension calculation into ModelConfig.

rubik-hua · 2026-06-02T04:40:27Z

infinilm和infinicore中的检视意见都已修改完毕，已有模型验证过了。

信号不好，上传图片太困难了

Total: 10 | OK: 10 | Failed: 0

pengcheng888

已approve，但需等待Infinicore pr和终审

rubik-hua requested a review from a team May 28, 2026 09:36

pengcheng888 reviewed Jun 1, 2026

View reviewed changes

Comment thread csrc/layers/rotary_embedding/rotary_embedding_factory.hpp

Comment thread csrc/models/llama_legacy/llama_config.hpp

Comment thread csrc/layers/rotary_embedding/rope_scaling_creators.cpp Outdated

pengcheng888 reviewed Jun 1, 2026

View reviewed changes

Comment thread csrc/layers/rotary_embedding/rotary_embedding.cpp Outdated

rubik-hua force-pushed the refactor_rope branch from bcbb1c3 to 830f3cd Compare June 1, 2026 11:42

pengcheng888 reviewed Jun 2, 2026

View reviewed changes

Comment thread csrc/models/glm4/glm4_for_causal_lm.cpp

pengcheng888 reviewed Jun 2, 2026

View reviewed changes

Comment thread csrc/layers/rotary_embedding/rope_scaling_creators.cpp

rubik-hua force-pushed the refactor_rope branch from 830f3cd to 2d1a360 Compare June 2, 2026 02:36

rubik-hua force-pushed the refactor_rope branch from 2d1a360 to 28dce8c Compare June 2, 2026 04:34

pengcheng888 approved these changes Jun 2, 2026

View reviewed changes

wooway777 merged commit 89c0a16 into InfiniTensor:main Jun 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue/401 refactor(rope): add scaling factory and TP-safe RoPE cache#402

issue/401 refactor(rope): add scaling factory and TP-safe RoPE cache#402
wooway777 merged 1 commit into
InfiniTensor:mainfrom
rubik-hua:refactor_rope

rubik-hua commented May 28, 2026 •

edited

Loading

Uh oh!

rubik-hua commented May 29, 2026

Uh oh!

wooway777 commented May 29, 2026

Uh oh!

pengcheng888 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pengcheng888 commented Jun 1, 2026 •

edited

Loading

Uh oh!

rubik-hua commented Jun 1, 2026

Uh oh!

Uh oh!

Uh oh!

rubik-hua commented Jun 2, 2026

Uh oh!

pengcheng888 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rubik-hua commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rubik-hua commented May 29, 2026

Uh oh!

wooway777 commented May 29, 2026

Uh oh!

pengcheng888 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pengcheng888 commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rubik-hua commented Jun 1, 2026

Uh oh!

Uh oh!

Uh oh!

rubik-hua commented Jun 2, 2026

信号不好，上传图片太困难了

Total: 10 | OK: 10 | Failed: 0

Uh oh!

pengcheng888 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rubik-hua commented May 28, 2026 •

edited

Loading

pengcheng888 commented Jun 1, 2026 •

edited

Loading