Fix gptqmodel backend check by jiqing-feng · Pull Request #2420 · huggingface/optimum

jiqing-feng · 2026-04-08T08:08:50Z

The GPTQModel has deprecated EXLLAMA_V1 from BACKEND, so we can remove this check.

Bug reproduce:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4"
device = "auto"
prompt = "What is the meaning of life?"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map=device, dtype=torch.bfloat16)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=32)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

output

Traceback (most recent call last):
  File "/workspace/jiqing/run_gptqmodel_generation.py", line 9, in <module>
    model = AutoModelForCausalLM.from_pretrained(model_id, device_map=device, dtype=torch.bfloat16)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/auto_factory.py", line 387, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py", line 4156, in from_pretrained
    hf_quantizer.postprocess_model(
  File "/usr/local/lib/python3.12/dist-packages/transformers/quantizers/base.py", line 194, in postprocess_model
    return self._process_model_after_weight_loading(model, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/quantizers/quantizer_gptq.py", line 98, in _process_model_after_weight_loading
    model = self.optimum_quantizer.post_init_model(model)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/optimum/gptq/quantizer.py", line 672, in post_init_model
    if self.desc_act and self.backend == BACKEND.EXLLAMA_V1 and self.max_input_length is not None:
                                         ^^^^^^^^^^^^^^^^^^
AttributeError: type object 'BACKEND' has no attribute 'EXLLAMA_V1'. Did you mean: 'EXLLAMA_V2'?

Signed-off-by: jiqing-feng <[email protected]>

jiqing-feng · 2026-04-08T08:09:19Z

Hi @SunMarc . Would you please review this PR? Thanks!
cc @Qubitium

Qubitium · 2026-04-08T08:21:51Z

@jiqing-feng Thanks for working on this.

@SunMarc I have hard deprecated the Exllama v1 kernel since GPT-QModel has two many kernels to maintain and there are already much better and faster kernels (Marlin, Exllama v2) on cuda.

Also historical perspective on why this ancient code exists: Exllama v1 requires the max_input_len to calculate and init a fixed runtime buffer before it can actually do foward passes. This fixed buffer init results in this ugly pre-inference code.

Qubitium · 2026-04-08T08:23:01Z

        model.quantize_config.desc_act = self.desc_act
        model = gptq_post_init(model, use_act_order=self.desc_act)
-        if self.desc_act and self.backend == BACKEND.EXLLAMA_V1 and self.max_input_length is not None:
-            model = exllama_set_max_input_length(model, self.max_input_length)


@jiqing-feng Can you check if anyone stil uses this? exllama_set_max_input_length If not and only v1 specific, this helper method can be removed too.

Checked no use of this func, removed. Thanks for the review.

Signed-off-by: jiqing-feng <[email protected]>

SunMarc

Indeed, thanks ! I don't mind removing it completely but we can also do a version check for gptqmodel since we should support prior version unless we bump the min version.

Signed-off-by: jiqing-feng <[email protected]>

jiqing-feng · 2026-04-10T02:28:12Z

Instead of version check, I use hasattr(BACKEND, "EXLLAMA_V1") which is more safe.
Hi @SunMarc . Would you please review it? Thanks!

SunMarc

Works for me ! Maybe just add a small comment on why we do hasattr so that we can remove this code path when we bump gptqmodel

IlyasMoutawwakil

lgtm !

HuggingFaceDocBuilderDev · 2026-04-10T14:05:47Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

jiqing-feng · 2026-04-13T02:25:06Z

Hi @IlyasMoutawwakil . I have added the comment and fixed the format. Please review it. Thanks!

Signed-off-by: jiqing-feng <[email protected]>

jiqing-feng · 2026-04-14T02:39:53Z

Hi @IlyasMoutawwakil . The failed test seems like env issue, I cannot reproduce locally. cc @Qubitium

Qubitium · 2026-04-14T03:21:04Z

Hi @IlyasMoutawwakil . The failed test seems like env issue, I cannot reproduce locally. cc @Qubitium

GPT-QModel depends on pypcre which is our libpcre2 binding for python. pycpre will first check if libpcre2 lib exists on system and build pkg so it re-uses os's libpcre2 module. Almost all linux/freebsd systems have libpcre2 installed. On windows or system without it installed, pycpre will download the pcre2 source code and install a local version. PyPcre has been CI tested on Ubuntu, FreeBSD, WSL envs.

But the thing is, I don't see any pypcre build errors. It looks like pypcre build failed, so it can't import pcre but I don't see the build errors to that effect.

Resolved 92 packages in 342ms
   Building gptqmodel==6.0.3
   Building pypcre==0.3.0. <---- correctly fetches and install pycpre which provides pcre module
   Building logbar==0.4.1
   Building defuser==0.0.19
   Building tokenicer==0.0.12
   Building device-smi==0.5.3
Downloading pandas (12.2MiB)
Downloading pyarrow (45.4MiB)
Downloading aiohttp (1.6MiB)
Downloading maturin (10.1MiB)
Downloading torchao (3.1MiB)
 Downloaded aiohttp

@IlyasMoutawwakil What is the actual Linux env that is executing this ci tasks? With some ctx, I may be able to replciate and track this down since absent of info, I have little info to find out exactly what caused the pypcre dependency to fail.

PyPcre 0.3.0 passed Ubuntu,FreeBSD, WSL unit tests: https://github.com/ModelCloud/PyPcre/actions/runs/24330360699

IlyasMoutawwakil · 2026-04-14T06:00:44Z

it's using the docker image nvida/cuda:12.8.1-cudnn-devel-ubuntu22.04 + python 3.10

jiqing-feng · 2026-04-14T07:43:32Z

Hi @Qubitium . I reproduced the error in nvida/cuda:12.8.1-cudnn-devel-ubuntu22.04, use uv env.
When I run pip install gptqmodel, I got importlib.metadata.PackageNotFoundError: No package metadata was found for torch even I installed torch 2.11.
When I run pip install gptqmodel --no-build-isolation, I got error: ModuleNotFoundError: No module named 'pcre'.
Would you please check it?

After start the container from the image, run

#!/bin/bash
apt-get update -o Acquire::AllowInsecureRepositories=true -o Acquire::AllowDowngradeToInsecureRepositories=true || true
apt-get install -y --allow-unauthenticated python3 python3-pip python3-venv git curl wget build-essential

python3 -m venv /opt/venv
source /opt/venv/bin/activate

pip install --upgrade pip setuptools wheel
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128

rm -rf /opt/optimum
git clone https://github.com/huggingface/optimum.git /opt/optimum
cd /opt/optimum
git fetch origin pull/2420/head:pr2420
git checkout pr2420

pip install -e '.[testing]'
pip install gptqmodel parameterized huggingface_hub

Qubitium · 2026-04-14T08:21:53Z

@IlyasMoutawwakil @jiqing-feng The error is caused by the CI env not having gcc. Pcre does not precompile wheel s since we want to link to OS level libpcre2 for both small pkg and for max security (pypcre would enjoy all the latest system security patches).

IlyasMoutawwakil · 2026-04-14T08:29:13Z

thanks for investigating, should we update the workflow file in this pr ?

Qubitium · 2026-04-14T08:35:25Z

thanks for investigating, should we update the workflow file in this pr ?

@IlyasMoutawwakil Please disregard my first message. I am wrong. The image does have gcc, I need to verify again and find src of issue. Will reply once I can confirm and also provide any workflow fix if any.

Qubitium · 2026-04-14T10:00:15Z

@IlyasMoutawwakil @jiqing-feng Workflow fix is below. There are two issues.

Primary issue is Torch 2.11 wheels for cuda12.8 has static setuptools requirement that conflicts with pypcre and gptqmodel. Pypi.org recommends and forcing packages to use new pyproject.toml format when it comes to licenses or else you get huge warning about deprecation. Torch, or at least the 12.8 wheels did not get that memo.

Pip is not smart enough to find a happy middle.

GPT-QModel v6.0.3 pypi pkg needs to install with --no-build-isolation. This requirement will be removed soon in v6.1 where all kernels are JIT compiled.

The above is the current workflow fix. I will try to resolve so both are not needed in next version of PyPcre and GPT-QModel and ci test under 12.8 cuda env. Our CI is using latest 13.0/13.2 cuda torch wheels.

#!/bin/bash
set -euo pipefail

apt-get update -o Acquire::AllowInsecureRepositories=true -o Acquire::AllowDowngradeToInsecureRepositories=true || true
apt-get install -y --allow-unauthenticated python3 python3-pip python3-venv git curl wget build-essential

python3 -m venv /opt/venv
source /opt/venv/bin/activate

# setuptools is upgraded but is immediately downgraded to old version by torch in next line
pip install --upgrade pip setuptools wheel
# this part is where torch downgrades setuptools to an ancient version that's not compatible with pypcre
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128

rm -rf /opt/optimum
git clone https://github.com/huggingface/optimum.git /opt/optimum
cd /opt/optimum
git fetch origin pull/2420/head:pr2420
git checkout pr2420

pip install -e '.[testing]'

pip install parameterized huggingface_hub

# fix bad pip resolution where torch rules caused ancient versions of setuptools to be installed which conflicts with pypcre/gptqmoel
# pin setuptools to a version that both torch and gptqmodel/pycre pkgs co co-exist together
pip install pypcre "setuptools>=78.1.1,<82"

# fix gptqmodel 6.0.3 needs to read/access torch during install
pip install gptqmodel --no-build-isolation

pip install parameterized huggingface_hub

jiqing-feng · 2026-04-15T02:06:28Z

Hi @IlyasMoutawwakil . I have updated the workflow yaml file. Would you please rerun the CI? Thanks!

Signed-off-by: jiqing-feng <[email protected]>

Qubitium · 2026-04-15T12:06:53Z

Fyi, this issue has been fixed in all GPTQ-Model dependent packages and pushed to pypi. But GPT-QModel itself still needs to pass critical ci-tests for v6.1 release so the current workflow patch is ok for now. Regression testing for all allowed setuptool versions are now part of our unit tests as well.

https://github.com/ModelCloud/PyPcre/actions/runs/24451501870

IlyasMoutawwakil · 2026-04-15T13:28:42Z

Thanks for investigating and fixing this ! 🤗

fix backend

feac642

Signed-off-by: jiqing-feng <[email protected]>

Qubitium reviewed Apr 8, 2026

View reviewed changes

rm exllama_set_max_input_length

228215b

Signed-off-by: jiqing-feng <[email protected]>

SunMarc reviewed Apr 9, 2026

View reviewed changes

add backend check

e94a212

Signed-off-by: jiqing-feng <[email protected]>

SunMarc approved these changes Apr 10, 2026

View reviewed changes

SunMarc requested a review from IlyasMoutawwakil April 10, 2026 13:40

IlyasMoutawwakil approved these changes Apr 10, 2026

View reviewed changes

add comments and format

2c95366

Signed-off-by: jiqing-feng <[email protected]>

Qubitium mentioned this pull request Apr 15, 2026

sync/downgrade setuptools depend ModelCloud/PyPcre#87

Merged

install pcre and setuptools for gptqmodel

0101682

Signed-off-by: jiqing-feng <[email protected]>

IlyasMoutawwakil reviewed Apr 15, 2026

View reviewed changes

Comment thread .github/workflows/test_gptq.yml

Apply suggestion from @IlyasMoutawwakil

33334e0

IlyasMoutawwakil merged commit cebb682 into huggingface:main Apr 15, 2026
16 checks passed

Qubitium mentioned this pull request Apr 23, 2026

Fix GPT-QModel compat and deprecate AutoGPTQ #2426

Open

3 tasks

Conversation

jiqing-feng commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiqing-feng commented Apr 8, 2026

Uh oh!

Qubitium commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Qubitium Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

jiqing-feng Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

jiqing-feng commented Apr 10, 2026

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

IlyasMoutawwakil left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Apr 10, 2026

Uh oh!

jiqing-feng commented Apr 13, 2026

Uh oh!

jiqing-feng commented Apr 14, 2026

Uh oh!

Qubitium commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IlyasMoutawwakil commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiqing-feng commented Apr 14, 2026

Uh oh!

Qubitium commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IlyasMoutawwakil commented Apr 14, 2026

Uh oh!

Qubitium commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Qubitium commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiqing-feng commented Apr 15, 2026

Uh oh!

Qubitium commented Apr 15, 2026

Uh oh!

Uh oh!

Uh oh!

IlyasMoutawwakil commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jiqing-feng commented Apr 8, 2026 •

edited

Loading

Qubitium commented Apr 8, 2026 •

edited

Loading

Qubitium commented Apr 14, 2026 •

edited

Loading

IlyasMoutawwakil commented Apr 14, 2026 •

edited

Loading

Qubitium commented Apr 14, 2026 •

edited

Loading

Qubitium commented Apr 14, 2026 •

edited

Loading

Qubitium commented Apr 14, 2026 •

edited

Loading