🐛 Bug Description
We are trying to run QVAC local LLM inference with hardware acceleration enabled on Android using the llama.cpp Vulkan backend.
On a OnePlus 10R device with Mali-G610 MC6 GPU, model initialization crashes natively when Vulkan/GPU offload is enabled, even with gpu_layers: 1. The crash happens inside @qvac/llm-llamacpp during backend tensor buffer allocation, before generation begins.
CPU mode works, but GPU mode consistently crashes the app.
🔄 Steps to Reproduce
Device:
OnePlus 10R / CPH2423
Android 15
MediaTek Dimensity 8100-Max / MT6895
GPU: Mali-G610 MC6
Vulkan device API: 1.1.177
Vulkan instance API: 1.3.0
Vulkan driver: Mali-G610 MC6
Driver info: v1.r32p1-01eac0.b89152572cfa9465230812a8225a45a0
Driver version: 32.21.0
Vendor: ARM / 0x13B5
Repro:
-
Build and install Expo/React Native Android app using QVAC SDK.
-
Download/install a small GGUF LLM model.
-
Enable hardware acceleration.
-
Initialize QVAC llama.cpp model with GPU/Vulkan enabled.
-
Use minimal GPU offload, e.g. gpu_layers: 1.
-
Start profiler/chat generation.
-
App crashes during model initialization.
-
Models tested include:
-- Qwen 3.5 Mobile 0.8B GGUF Q4_K_M
-- Llama 3.2 1B / 3B Q4 variants
-- MedPsy 1.7B GGUF
Mitigations attempted:
VK_LOADER_LAYERS_DISABLE="*"
GGML_VK_FORCE_LINEAR="1"
GGML_VK_DISABLE_F16="1"
cache-type-k: f16
cache-type-v: f16
flash-attn: off
split-mode: none
main-gpu: integrated
Removed no_mmap, because QVAC/llama parser rejected it as invalid
Updated to QVAC SDK 0.12.0 / @qvac/llm-llamacpp 0.22.1
✅ Expected Behavior
With hardware acceleration enabled and gpu_layers: 1, the model should initialize successfully using Vulkan GPU offload, or fail gracefully with a JavaScript/native error that can be caught and used to fall back to CPU.
❌ Actual Behavior
The app crashes with a native SIGSEGV during model initialization.
The crash occurs in libqvac__llm-llamacpp.0.22.1.so, specifically in the llama.cpp backend buffer allocation path:
ggml_backend_alloc_ctx_tensors_from_buft
llama_model::create_backend_buffers
llama_model::load_tensors
llama_model_load_from_file
common_init_from_params
LlamaModel::init
JsInterface::activate
This appears to be the same root issue across tested models: native Vulkan backend allocation crashes before generation.
📜 Stack Trace / Error Output
Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0
Process: io.daemon.mobile
Thread: mqt_v_js
Cause: null pointer dereference
#04 libqvac__llm-llamacpp.0.22.1.so
#05 libqvac__llm-llamacpp.0.22.1.so
#06 ggml_backend_alloc_ctx_tensors_from_buft + 116
#07 llama_model::create_backend_buffers(ggml_backend_sched*, std::__ndk1::vector<ggml_backend_buffer_type*, std::__ndk1::allocator<ggml_backend_buffer_type*> >&) + 1008
#08 llama_model::load_tensors(llama_model_loader&)
#10 llama_model_load_from_file + 132
#11 common_init_result::common_init_result(common_params&) + 456
#12 common_init_from_params(common_params&) + 56
#13 initFromConfig(...) + 752
#14 LlamaModel::init(bool) + 1948
#16 InitLoader::waitForLoadInitialization() + 68
#18 qvac_lib_inference_addon_cpp::JsInterface::activate(...) + 112
#19 libbare-kit.so js_callback_s::on_call(...)
💻 Platform / OS
Android (Expo)
🖥️ OS Version
Android 15
⚙️ Runtime Environment
Expo (React Native)
📦 Runtime Version
Expo: 54.0.34, React Native: 0.81.5, react-native-bare-kit: 0.14.2, bare-pack: 2.0.1, React: 19.1.0
🏷️ SDK Version
0.12.0
📋 Relevant Dependencies
@qvac/sdk: 0.12.0
@qvac/llm-llamacpp: 0.22.1
@qvac/cli: 0.5.0
react-native-bare-kit: 0.14.2
bare-pack: 2.0.1
expo: 54.0.34
react-native: 0.81.5
react: 19.1.0
🔁 Frequency
Always (100%)
🔥 Severity
Critical - Complete blocker, no workaround
🩹 Workaround
No response
📎 Additional Context
Can QVAC confirm whether @qvac/llm-llamacpp Vulkan offload is expected to work on Mali-G610 MC6 / Vulkan 1.1.177? If this GPU/driver is supported, we would appreciate guidance on a Mali-safe config or build option to avoid the crashing allocation path in ggml_backend_alloc_ctx_tensors_from_buft.
Specifically, is there a QVAC-supported way to:
- Disable pinned/staging memory behavior for Vulkan
- Force safer Mali memory types/layouts
- Avoid Vulkan backend allocation for full context/scratch buffers when only gpu_layers: 1
- Gracefully fall back to CPU instead of native SIGSEGV when Vulkan allocation fails
✅ Checklist
🐛 Bug Description
We are trying to run QVAC local LLM inference with hardware acceleration enabled on Android using the llama.cpp Vulkan backend.
On a OnePlus 10R device with Mali-G610 MC6 GPU, model initialization crashes natively when Vulkan/GPU offload is enabled, even with gpu_layers: 1. The crash happens inside @qvac/llm-llamacpp during backend tensor buffer allocation, before generation begins.
CPU mode works, but GPU mode consistently crashes the app.
🔄 Steps to Reproduce
Device:
Repro:
Build and install Expo/React Native Android app using QVAC SDK.
Download/install a small GGUF LLM model.
Enable hardware acceleration.
Initialize QVAC llama.cpp model with GPU/Vulkan enabled.
Use minimal GPU offload, e.g. gpu_layers: 1.
Start profiler/chat generation.
App crashes during model initialization.
Models tested include:
-- Qwen 3.5 Mobile 0.8B GGUF Q4_K_M
-- Llama 3.2 1B / 3B Q4 variants
-- MedPsy 1.7B GGUF
Mitigations attempted:
VK_LOADER_LAYERS_DISABLE="*"
GGML_VK_FORCE_LINEAR="1"
GGML_VK_DISABLE_F16="1"
cache-type-k: f16
cache-type-v: f16
flash-attn: off
split-mode: none
main-gpu: integrated
Removed no_mmap, because QVAC/llama parser rejected it as invalid
Updated to QVAC SDK 0.12.0 / @qvac/llm-llamacpp 0.22.1
✅ Expected Behavior
With hardware acceleration enabled and gpu_layers: 1, the model should initialize successfully using Vulkan GPU offload, or fail gracefully with a JavaScript/native error that can be caught and used to fall back to CPU.
❌ Actual Behavior
The app crashes with a native SIGSEGV during model initialization.
The crash occurs in libqvac__llm-llamacpp.0.22.1.so, specifically in the llama.cpp backend buffer allocation path:
This appears to be the same root issue across tested models: native Vulkan backend allocation crashes before generation.
📜 Stack Trace / Error Output
💻 Platform / OS
Android (Expo)
🖥️ OS Version
Android 15
⚙️ Runtime Environment
Expo (React Native)
📦 Runtime Version
Expo: 54.0.34, React Native: 0.81.5, react-native-bare-kit: 0.14.2, bare-pack: 2.0.1, React: 19.1.0
🏷️ SDK Version
0.12.0
📋 Relevant Dependencies
🔁 Frequency
Always (100%)
🔥 Severity
Critical - Complete blocker, no workaround
🩹 Workaround
No response
📎 Additional Context
Can QVAC confirm whether @qvac/llm-llamacpp Vulkan offload is expected to work on Mali-G610 MC6 / Vulkan 1.1.177? If this GPU/driver is supported, we would appreciate guidance on a Mali-safe config or build option to avoid the crashing allocation path in ggml_backend_alloc_ctx_tensors_from_buft.
Specifically, is there a QVAC-supported way to:
✅ Checklist