issue/406 - feat: support gpt2 by spike-zhu · Pull Request #416 · InfiniTensor/InfiniLM

spike-zhu · 2026-06-05T13:42:01Z

InfiniLM 支持 GPT 2，测试截图如下

InfiniLM 推理与Transformer 推理对比测试截图：

备注：
openai-community_gpt2 存储的数据类型为 fp32，paged_attention 相关算子只支持 bf16/fp16，后面的测试均基于静态 cache 进行；

服务端启动命令参考：
CUDA_VISIBLE_DEVICES=6,7 python python/infinilm/server/inference_server.py
--model /data-aisoft/mechdancer/models/openai-community_gpt2
--device nvidia
--tp 2
--num-blocks 1024
--block-size 256
--max-batch-size 32
--max-new-tokens 512
--host 0.0.0.0
--port 8000

客户端发送请求参考：
curl --noproxy '*' -v -X POST http://127.0.0.1:8000/chat/completions
-H 'Content-Type: application/json'
-d '{
"prompt": "tell me a story",
"stream": false,
"max_tokens": 50
}'

pengcheng888 · 2026-06-05T13:53:56Z

        self,
        model_path,
        device=None,
+        dtype="float16",


为什么要指定数据类型，增加这个参数，默认的类型不能跑么

为什么要指定数据类型，增加这个参数，默认的类型不能跑么

openai-community/gpt2 的 config.json 中没有 dtype 相关配置，无法通过读取 HF 配置自动获取数据类型，因此需要在测试命令中显式指定 dtype。如果不指定，当前代码无法确定模型应使用的数据类型进行加载。

gpt2跑了，其他模型能跑么。 dtype=这个参数需要再斟酌。

接口的优先级更高，但下面的实现却是config.json的dtype优先级高
if self.hf_config.get("torch_dtype") is None and self.hf_config.get("dtype") is None:
self.hf_config["torch_dtype"] = dtype

gpt2跑了，其他模型能跑么。 dtype=这个参数需要再斟酌。

接口的优先级更高，但下面的实现却是config.json的dtype优先级高 if self.hf_config.get("torch_dtype") is None and self.hf_config.get("dtype") is None: self.hf_config["torch_dtype"] = dtype

在 read_hf_config 通过判断模型是否为 gpt2，如果为 gpt2 增加 torch_dtype 为 fp32，进而不再引入 dtype 相关参数

pengcheng888 · 2026-06-05T13:55:19Z


        for k in f.keys():
-            state_dict[k] = f.get_tensor(k).to(device=device)
+            state_dict[k] = f.get_tensor(k).to(device=device, dtype=dtype)


添加.to(dtype=dtype)的话，轶群的量化模型可能就不能跑了

添加.to(dtype=dtype)的话，轶群的量化模型可能就不能跑了

已通过其他方式规避

pengcheng888 · 2026-06-05T14:08:12Z

给出tp=2的测试截图

pengcheng888 · 2026-06-05T14:09:26Z

-    outputs = model.chat(
-        messages=conversations,
-    )
+    if getattr(model.engine.tokenizer, "chat_template", None):


服务能跑么

服务能跑么

服务测试截图：

spike-zhu · 2026-06-09T03:03:08Z

给出tp=2的测试截图

tp=2 测试截图：

pengcheng888 · 2026-06-09T04:47:12Z

            sampling_params = self._build_sampling_params(data)

-            req = self.engine.add_chat_request(
+            req = self._add_generation_request(


需要再斟酌一下，或许不应该这么修改

需要再斟酌一下，或许不应该这么修改

gpt2 不支持 chat_template，是需要和原有服务走的路径形成区分。如果不这样修改的话，怎么修改比较合适？

spike-zhu requested review from a team and wooway777 June 5, 2026 13:42

spike-zhu marked this pull request as draft June 5, 2026 13:43

pengcheng888 reviewed Jun 5, 2026

View reviewed changes

spike-zhu marked this pull request as ready for review June 9, 2026 03:03

spike-zhu force-pushed the issue/406 branch from c7745f9 to 3fda3ec Compare June 9, 2026 03:38

spike-zhu requested review from ma-hang and pengcheng888 June 9, 2026 04:35

spike-zhu self-assigned this Jun 9, 2026

pengcheng888 reviewed Jun 9, 2026

View reviewed changes

spike-zhu force-pushed the issue/406 branch from 3fda3ec to 347cfea Compare June 9, 2026 06:34

spike-zhu requested a review from pengcheng888 June 9, 2026 06:38

issue/406 - feat: support gpt2

c3ac64d

spike-zhu force-pushed the issue/406 branch from 347cfea to c3ac64d Compare June 9, 2026 10:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue/406 - feat: support gpt2#416

issue/406 - feat: support gpt2#416
spike-zhu wants to merge 1 commit into
mainfrom
issue/406

spike-zhu commented Jun 5, 2026 •

edited

Loading

Uh oh!

pengcheng888 Jun 5, 2026

Uh oh!

spike-zhu Jun 9, 2026

Uh oh!

pengcheng888 Jun 9, 2026 •

edited

Loading

Uh oh!

spike-zhu Jun 9, 2026

Uh oh!

pengcheng888 Jun 5, 2026

Uh oh!

spike-zhu Jun 9, 2026

Uh oh!

pengcheng888 commented Jun 5, 2026

Uh oh!

pengcheng888 Jun 5, 2026

Uh oh!

spike-zhu Jun 9, 2026

Uh oh!

spike-zhu commented Jun 9, 2026

Uh oh!

pengcheng888 Jun 9, 2026

Uh oh!

spike-zhu Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

spike-zhu commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pengcheng888 Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pengcheng888 commented Jun 5, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

spike-zhu commented Jun 9, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

spike-zhu commented Jun 5, 2026 •

edited

Loading

pengcheng888 Jun 9, 2026 •

edited

Loading