Fix gptqmodel backend check#2420
Conversation
Signed-off-by: jiqing-feng <[email protected]>
|
@jiqing-feng Thanks for working on this. @SunMarc I have hard deprecated the Exllama v1 kernel since GPT-QModel has two many kernels to maintain and there are already much better and faster kernels (Marlin, Exllama v2) on cuda. Also historical perspective on why this ancient code exists: Exllama v1 requires the max_input_len to calculate and init a fixed runtime buffer before it can actually do |
| model.quantize_config.desc_act = self.desc_act | ||
| model = gptq_post_init(model, use_act_order=self.desc_act) | ||
| if self.desc_act and self.backend == BACKEND.EXLLAMA_V1 and self.max_input_length is not None: | ||
| model = exllama_set_max_input_length(model, self.max_input_length) |
There was a problem hiding this comment.
@jiqing-feng Can you check if anyone stil uses this? exllama_set_max_input_length If not and only v1 specific, this helper method can be removed too.
There was a problem hiding this comment.
Checked no use of this func, removed. Thanks for the review.
Signed-off-by: jiqing-feng <[email protected]>
SunMarc
left a comment
There was a problem hiding this comment.
Indeed, thanks ! I don't mind removing it completely but we can also do a version check for gptqmodel since we should support prior version unless we bump the min version.
Signed-off-by: jiqing-feng <[email protected]>
|
Instead of version check, I use |
SunMarc
left a comment
There was a problem hiding this comment.
Works for me ! Maybe just add a small comment on why we do hasattr so that we can remove this code path when we bump gptqmodel
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Hi @IlyasMoutawwakil . I have added the comment and fixed the format. Please review it. Thanks! |
Signed-off-by: jiqing-feng <[email protected]>
|
Hi @IlyasMoutawwakil . The failed test seems like env issue, I cannot reproduce locally. cc @Qubitium |
GPT-QModel depends on But the thing is, I don't see any Resolved 92 packages in 342ms
Building gptqmodel==6.0.3
Building pypcre==0.3.0. <---- correctly fetches and install pycpre which provides pcre module
Building logbar==0.4.1
Building defuser==0.0.19
Building tokenicer==0.0.12
Building device-smi==0.5.3
Downloading pandas (12.2MiB)
Downloading pyarrow (45.4MiB)
Downloading aiohttp (1.6MiB)
Downloading maturin (10.1MiB)
Downloading torchao (3.1MiB)
Downloaded aiohttp@IlyasMoutawwakil What is the actual Linux env that is executing this ci tasks? With some ctx, I may be able to replciate and track this down since absent of info, I have little info to find out exactly what caused the
|
|
it's using the docker image nvida/cuda:12.8.1-cudnn-devel-ubuntu22.04 + python 3.10 |
|
Hi @Qubitium . I reproduced the error in After start the container from the image, run |
|
@IlyasMoutawwakil @jiqing-feng The error is caused by the CI env not having |
|
thanks for investigating, should we update the workflow file in this pr ? |
@IlyasMoutawwakil Please disregard my first message. I am wrong. The image does have gcc, I need to verify again and find src of issue. Will reply once I can confirm and also provide any workflow fix if any. |
|
@IlyasMoutawwakil @jiqing-feng Workflow fix is below. There are two issues.
Pip is not smart enough to find a happy middle.
The above is the current workflow fix. I will try to resolve so both are not needed in next version of PyPcre and GPT-QModel and ci test under 12.8 cuda env. Our CI is using latest 13.0/13.2 cuda torch wheels. #!/bin/bash
set -euo pipefail
apt-get update -o Acquire::AllowInsecureRepositories=true -o Acquire::AllowDowngradeToInsecureRepositories=true || true
apt-get install -y --allow-unauthenticated python3 python3-pip python3-venv git curl wget build-essential
python3 -m venv /opt/venv
source /opt/venv/bin/activate
# setuptools is upgraded but is immediately downgraded to old version by torch in next line
pip install --upgrade pip setuptools wheel
# this part is where torch downgrades setuptools to an ancient version that's not compatible with pypcre
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128
rm -rf /opt/optimum
git clone https://github.com/huggingface/optimum.git /opt/optimum
cd /opt/optimum
git fetch origin pull/2420/head:pr2420
git checkout pr2420
pip install -e '.[testing]'
pip install parameterized huggingface_hub
# fix bad pip resolution where torch rules caused ancient versions of setuptools to be installed which conflicts with pypcre/gptqmoel
# pin setuptools to a version that both torch and gptqmodel/pycre pkgs co co-exist together
pip install pypcre "setuptools>=78.1.1,<82"
# fix gptqmodel 6.0.3 needs to read/access torch during install
pip install gptqmodel --no-build-isolation
pip install parameterized huggingface_hub |
|
Hi @IlyasMoutawwakil . I have updated the workflow yaml file. Would you please rerun the CI? Thanks! |
Signed-off-by: jiqing-feng <[email protected]>
|
Fyi, this issue has been fixed in all GPTQ-Model dependent packages and pushed to https://github.com/ModelCloud/PyPcre/actions/runs/24451501870 |
|
Thanks for investigating and fixing this ! 🤗 |
The GPTQModel has deprecated
EXLLAMA_V1fromBACKEND, so we can remove this check.Bug reproduce:
output