Skip to content

Fix gptqmodel backend check#2420

Merged
IlyasMoutawwakil merged 6 commits into
huggingface:mainfrom
jiqing-feng:gptq
Apr 15, 2026
Merged

Fix gptqmodel backend check#2420
IlyasMoutawwakil merged 6 commits into
huggingface:mainfrom
jiqing-feng:gptq

Conversation

@jiqing-feng
Copy link
Copy Markdown
Contributor

@jiqing-feng jiqing-feng commented Apr 8, 2026

The GPTQModel has deprecated EXLLAMA_V1 from BACKEND, so we can remove this check.

Bug reproduce:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4"
device = "auto"
prompt = "What is the meaning of life?"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map=device, dtype=torch.bfloat16)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=32)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

output

Traceback (most recent call last):
  File "/workspace/jiqing/run_gptqmodel_generation.py", line 9, in <module>
    model = AutoModelForCausalLM.from_pretrained(model_id, device_map=device, dtype=torch.bfloat16)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/auto/auto_factory.py", line 387, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py", line 4156, in from_pretrained
    hf_quantizer.postprocess_model(
  File "/usr/local/lib/python3.12/dist-packages/transformers/quantizers/base.py", line 194, in postprocess_model
    return self._process_model_after_weight_loading(model, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/quantizers/quantizer_gptq.py", line 98, in _process_model_after_weight_loading
    model = self.optimum_quantizer.post_init_model(model)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/optimum/gptq/quantizer.py", line 672, in post_init_model
    if self.desc_act and self.backend == BACKEND.EXLLAMA_V1 and self.max_input_length is not None:
                                         ^^^^^^^^^^^^^^^^^^
AttributeError: type object 'BACKEND' has no attribute 'EXLLAMA_V1'. Did you mean: 'EXLLAMA_V2'?

Signed-off-by: jiqing-feng <[email protected]>
@jiqing-feng
Copy link
Copy Markdown
Contributor Author

Hi @SunMarc . Would you please review this PR? Thanks!
cc @Qubitium

@Qubitium
Copy link
Copy Markdown
Contributor

Qubitium commented Apr 8, 2026

@jiqing-feng Thanks for working on this.

@SunMarc I have hard deprecated the Exllama v1 kernel since GPT-QModel has two many kernels to maintain and there are already much better and faster kernels (Marlin, Exllama v2) on cuda.

Also historical perspective on why this ancient code exists: Exllama v1 requires the max_input_len to calculate and init a fixed runtime buffer before it can actually do foward passes. This fixed buffer init results in this ugly pre-inference code.

Comment thread optimum/gptq/quantizer.py
model.quantize_config.desc_act = self.desc_act
model = gptq_post_init(model, use_act_order=self.desc_act)
if self.desc_act and self.backend == BACKEND.EXLLAMA_V1 and self.max_input_length is not None:
model = exllama_set_max_input_length(model, self.max_input_length)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiqing-feng Can you check if anyone stil uses this? exllama_set_max_input_length If not and only v1 specific, this helper method can be removed too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked no use of this func, removed. Thanks for the review.

Copy link
Copy Markdown
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, thanks ! I don't mind removing it completely but we can also do a version check for gptqmodel since we should support prior version unless we bump the min version.

Signed-off-by: jiqing-feng <[email protected]>
@jiqing-feng
Copy link
Copy Markdown
Contributor Author

Instead of version check, I use hasattr(BACKEND, "EXLLAMA_V1") which is more safe.
Hi @SunMarc . Would you please review it? Thanks!

Copy link
Copy Markdown
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me ! Maybe just add a small comment on why we do hasattr so that we can remove this code path when we bump gptqmodel

Copy link
Copy Markdown
Member

@IlyasMoutawwakil IlyasMoutawwakil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm !

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@jiqing-feng
Copy link
Copy Markdown
Contributor Author

Hi @IlyasMoutawwakil . I have added the comment and fixed the format. Please review it. Thanks!

Signed-off-by: jiqing-feng <[email protected]>
@jiqing-feng
Copy link
Copy Markdown
Contributor Author

Hi @IlyasMoutawwakil . The failed test seems like env issue, I cannot reproduce locally. cc @Qubitium

@Qubitium
Copy link
Copy Markdown
Contributor

Qubitium commented Apr 14, 2026

Hi @IlyasMoutawwakil . The failed test seems like env issue, I cannot reproduce locally. cc @Qubitium

GPT-QModel depends on pypcre which is our libpcre2 binding for python. pycpre will first check if libpcre2 lib exists on system and build pkg so it re-uses os's libpcre2 module. Almost all linux/freebsd systems have libpcre2 installed. On windows or system without it installed, pycpre will download the pcre2 source code and install a local version. PyPcre has been CI tested on Ubuntu, FreeBSD, WSL envs.

But the thing is, I don't see any pypcre build errors. It looks like pypcre build failed, so it can't import pcre but I don't see the build errors to that effect.

Resolved 92 packages in 342ms
   Building gptqmodel==6.0.3
   Building pypcre==0.3.0. <---- correctly fetches and install pycpre which provides pcre module
   Building logbar==0.4.1
   Building defuser==0.0.19
   Building tokenicer==0.0.12
   Building device-smi==0.5.3
Downloading pandas (12.2MiB)
Downloading pyarrow (45.4MiB)
Downloading aiohttp (1.6MiB)
Downloading maturin (10.1MiB)
Downloading torchao (3.1MiB)
 Downloaded aiohttp

@IlyasMoutawwakil What is the actual Linux env that is executing this ci tasks? With some ctx, I may be able to replciate and track this down since absent of info, I have little info to find out exactly what caused the pypcre dependency to fail.

PyPcre 0.3.0 passed Ubuntu,FreeBSD, WSL unit tests: https://github.com/ModelCloud/PyPcre/actions/runs/24330360699

@IlyasMoutawwakil
Copy link
Copy Markdown
Member

IlyasMoutawwakil commented Apr 14, 2026

it's using the docker image nvida/cuda:12.8.1-cudnn-devel-ubuntu22.04 + python 3.10

@jiqing-feng
Copy link
Copy Markdown
Contributor Author

Hi @Qubitium . I reproduced the error in nvida/cuda:12.8.1-cudnn-devel-ubuntu22.04, use uv env.
When I run pip install gptqmodel, I got importlib.metadata.PackageNotFoundError: No package metadata was found for torch even I installed torch 2.11.
When I run pip install gptqmodel --no-build-isolation, I got error: ModuleNotFoundError: No module named 'pcre'.
Would you please check it?

After start the container from the image, run

#!/bin/bash
apt-get update -o Acquire::AllowInsecureRepositories=true -o Acquire::AllowDowngradeToInsecureRepositories=true || true
apt-get install -y --allow-unauthenticated python3 python3-pip python3-venv git curl wget build-essential

python3 -m venv /opt/venv
source /opt/venv/bin/activate

pip install --upgrade pip setuptools wheel
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128

rm -rf /opt/optimum
git clone https://github.com/huggingface/optimum.git /opt/optimum
cd /opt/optimum
git fetch origin pull/2420/head:pr2420
git checkout pr2420

pip install -e '.[testing]'
pip install gptqmodel parameterized huggingface_hub

@Qubitium
Copy link
Copy Markdown
Contributor

Qubitium commented Apr 14, 2026

@IlyasMoutawwakil @jiqing-feng The error is caused by the CI env not having gcc. Pcre does not precompile wheel s since we want to link to OS level libpcre2 for both small pkg and for max security (pypcre would enjoy all the latest system security patches).

@IlyasMoutawwakil
Copy link
Copy Markdown
Member

thanks for investigating, should we update the workflow file in this pr ?

@Qubitium
Copy link
Copy Markdown
Contributor

Qubitium commented Apr 14, 2026

thanks for investigating, should we update the workflow file in this pr ?

@IlyasMoutawwakil Please disregard my first message. I am wrong. The image does have gcc, I need to verify again and find src of issue. Will reply once I can confirm and also provide any workflow fix if any.

@Qubitium
Copy link
Copy Markdown
Contributor

Qubitium commented Apr 14, 2026

@IlyasMoutawwakil @jiqing-feng Workflow fix is below. There are two issues.

  1. Primary issue is Torch 2.11 wheels for cuda12.8 has static setuptools requirement that conflicts with pypcre and gptqmodel. Pypi.org recommends and forcing packages to use new pyproject.toml format when it comes to licenses or else you get huge warning about deprecation. Torch, or at least the 12.8 wheels did not get that memo.

Pip is not smart enough to find a happy middle.

  1. GPT-QModel v6.0.3 pypi pkg needs to install with --no-build-isolation. This requirement will be removed soon in v6.1 where all kernels are JIT compiled.

The above is the current workflow fix. I will try to resolve so both are not needed in next version of PyPcre and GPT-QModel and ci test under 12.8 cuda env. Our CI is using latest 13.0/13.2 cuda torch wheels.

#!/bin/bash
set -euo pipefail

apt-get update -o Acquire::AllowInsecureRepositories=true -o Acquire::AllowDowngradeToInsecureRepositories=true || true
apt-get install -y --allow-unauthenticated python3 python3-pip python3-venv git curl wget build-essential

python3 -m venv /opt/venv
source /opt/venv/bin/activate

# setuptools is upgraded but is immediately downgraded to old version by torch in next line
pip install --upgrade pip setuptools wheel
# this part is where torch downgrades setuptools to an ancient version that's not compatible with pypcre
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128

rm -rf /opt/optimum
git clone https://github.com/huggingface/optimum.git /opt/optimum
cd /opt/optimum
git fetch origin pull/2420/head:pr2420
git checkout pr2420

pip install -e '.[testing]'

pip install parameterized huggingface_hub

# fix bad pip resolution where torch rules caused ancient versions of setuptools to be installed which conflicts with pypcre/gptqmoel
# pin setuptools to a version that both torch and gptqmodel/pycre pkgs co co-exist together
pip install pypcre "setuptools>=78.1.1,<82"

# fix gptqmodel 6.0.3 needs to read/access torch during install
pip install gptqmodel --no-build-isolation

pip install parameterized huggingface_hub

@jiqing-feng
Copy link
Copy Markdown
Contributor Author

Hi @IlyasMoutawwakil . I have updated the workflow yaml file. Would you please rerun the CI? Thanks!

@Qubitium
Copy link
Copy Markdown
Contributor

Fyi, this issue has been fixed in all GPTQ-Model dependent packages and pushed to pypi. But GPT-QModel itself still needs to pass critical ci-tests for v6.1 release so the current workflow patch is ok for now. Regression testing for all allowed setuptool versions are now part of our unit tests as well.

https://github.com/ModelCloud/PyPcre/actions/runs/24451501870

Comment thread .github/workflows/test_gptq.yml
@IlyasMoutawwakil IlyasMoutawwakil merged commit cebb682 into huggingface:main Apr 15, 2026
16 checks passed
@IlyasMoutawwakil
Copy link
Copy Markdown
Member

Thanks for investigating and fixing this ! 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants