Commit 9609c82
Ralf Waldukat
fix: prevent KV cache corruption on SWA/ISWA models (e.g. Gemma-4)
SWA/ISWA KV caches maintain global position maps (g_iswa_pos_max/min) that
are only cleared by llama_memory_clear(), not by kv_cache_seq_rm(). When
generate() finds a prefix match (e.g. shared BOS token), it calls
kv_cache_seq_rm which returns True for ISWA, skipping the full reset. But
the stale position maps cause batch allocator inconsistency and
llama_decode returned -1 on subsequent prompts.
Changes:
- Add _has_swa property via llama_model_n_swa() > 0
- reset() now calls llama_memory_clear() unconditionally
- generate() bypasses prefix-match optimization for SWA models,
forcing full state reset (same path as recurrent models)1 parent 1cb8b9f commit 9609c82
3 files changed
+110
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
553 | 553 | | |
554 | 554 | | |
555 | 555 | | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
556 | 564 | | |
557 | 565 | | |
558 | 566 | | |
| |||
638 | 646 | | |
639 | 647 | | |
640 | 648 | | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
641 | 653 | | |
642 | 654 | | |
643 | 655 | | |
| |||
889 | 901 | | |
890 | 902 | | |
891 | 903 | | |
892 | | - | |
| 904 | + | |
893 | 905 | | |
894 | 906 | | |
895 | 907 | | |
896 | 908 | | |
| 909 | + | |
| 910 | + | |
| 911 | + | |
| 912 | + | |
| 913 | + | |
| 914 | + | |
| 915 | + | |
| 916 | + | |
| 917 | + | |
| 918 | + | |
| 919 | + | |
| 920 | + | |
| 921 | + | |
| 922 | + | |
| 923 | + | |
| 924 | + | |
| 925 | + | |
| 926 | + | |
897 | 927 | | |
898 | 928 | | |
899 | 929 | | |
| |||
1259 | 1289 | | |
1260 | 1290 | | |
1261 | 1291 | | |
| 1292 | + | |
| 1293 | + | |
1262 | 1294 | | |
1263 | 1295 | | |
1264 | 1296 | | |
| |||
1682 | 1714 | | |
1683 | 1715 | | |
1684 | 1716 | | |
| 1717 | + | |
1685 | 1718 | | |
1686 | 1719 | | |
1687 | 1720 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
Submodule llama.cpp updated from 3bd9aa1 to 535f761
0 commit comments