-
Notifications
You must be signed in to change notification settings - Fork 333
feat(RL): add RL support for verl #1298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
shihaobai
wants to merge
229
commits into
main
Choose a base branch
from
rl_verl_rebase_main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 182 commits
Commits
Show all changes
229 commits
Select commit
Hold shift + click to select a range
b636310
add /flush_cache (#1108)
shihaobai 60c379e
Aborted reqs (#1113)
shihaobai 4095831
flush cache mulit node (#1116)
shihaobai ca9325f
[bugfix]: flush cache in single node (#1118)
shihaobai 9948925
add pause and continue (#1120)
shihaobai 4b32287
add launch_server and StartArgs (#1119)
sufubao 27abcf5
Update weight (#1127)
kingder c210c82
release and resume (#1122)
shihaobai 094df8c
use portpicker (#1142)
sufubao 560be02
Rl weight (#1143)
shihaobai 3d225d7
add_cli
sufubao 499074a
add 30b moe configs
shihaobai f737585
update requirement
shihaobai 8a67a47
add-neo-chat
fdc1369
add-neo-chat
e8e7416
add-neo-chat
ba44983
add-neo-chat
4d41a33
add-neo-chat
0e8845c
fix-neo-chat
b48cd49
fix-neo-chat-position-ids-h
7a904f3
add-neo-chat-dense
4b757dd
add-neo-chat-dense
e208733
support verl.
245357c
improve0108
6503ac8
add min/max pixels sampling parameters
07df460
fix fused_moe not installed use pip.
a6f00fb
add visual nccl port alloc
shihaobai 9360197
fix0115
920a741
fix0115
3aa5e18
fp8 online quant for moe
shihaobai 7cb890b
hotfix for fa3 of llama
shihaobai c242a75
fp8w8a8 triton config
shihaobai a0195aa
fp16 config
shihaobai 7f0c437
release ipc tensor early.
5738d9e
bugfix: fix flattened_bucket update weights
yqyao e11bf58
bugfix: fix update_weights from tensor
yqyao f767609
merge main
shihaobai ce76f8a
fix start
shihaobai 45259ec
add-merge-kv-mode
da3b53d
add-neo-chat0129
1e066d0
Merge branch 'add-neo-chat-rebase' into rl_verl
043e898
moe fused weight
shihaobai 52085a4
Merge branch 'rl_verl_rebase' of https://github.com/ModelTC/lightllm …
shihaobai 80cfcc4
fix neo
shihaobai 6bbdb4f
fix launch
shihaobai e436ba5
fix launch
shihaobai aef65bc
fix tp slice for merged moe weight
shihaobai bc87692
fix fusemoe weight
shihaobai cf5bcbf
fa3 for neo
shihaobai a23288b
fix dead visual process
shihaobai f558540
auto visual dp
shihaobai 12c6c6b
fix format
shihaobai fd91cad
fix decode scale
2681263
add new mode support text_ids+image_ids
fd17aa0
add new mode support text_ids+image_ids
e516bd9
add cuda empty cache
shihaobai 81a0c12
add invalid token ids to sampling_param for rl training
shihaobai 14132d5
add unitest for apply_invalid_tokens
shihaobai ed41960
add gc collect
shihaobai 706ae2e
logit_bias
shihaobai f432f5a
logit_bias
shihaobai 92bf83a
Merge branch 'main' into rl_verl_rebase
shihaobai 8f8ed44
merge main
shihaobai cac2edf
neo moe inferece speedup
shihaobai 02078ad
port random generate
shihaobai 68954b0
feat: add MoE expert routing capture for R3 rollout replay
sufubao 3569d53
fix
sufubao fe54253
add node-id for env_utils
shihaobai 92470f7
Merge branch 'rl_verl_rebase' of https://github.com/ModelTC/lightllm …
shihaobai 8eead2b
Revert "add node-id for env_utils"
sufubao 27f9e87
Revert "port random generate"
sufubao 6fa8f74
add assert none
shihaobai bf83078
set_unique_server_name
shihaobai 3eab5a7
fix return_routed_experts
sufubao 14cfc95
fix r3
sufubao e8ed8b5
add-neo++
77b73c2
feat: add Qwen3Next linear attention model support
sufubao a4ab210
refactor: simplify mamba buffer copy and integrate Triton kernels
sufubao 1686d34
fix conv3d
sufubao dd9b611
[draft] qwen3.5 dense
sufubao 6a3a17c
split dense and moe
sufubao e1cdfb4
feat: add mamba_cache_ratio for automatic memory allocation
sufubao f2e148e
refactor: simplify mamba_cache_ratio to direct percentage
sufubao b4fe201
add H100 config
sufubao e2ce9c0
refactor: align radix_cache_class with infer_state_class style
sufubao b1adbf3
fix: add missing attention_chunk param to flashattention_nopad.py
sufubao c744ebd
refactor: clarify naming in mamba_buffer_copy
sufubao 9def697
clean
sufubao 2b3deb8
fix
sufubao 61f8945
clean
sufubao f7280a3
split
sufubao 86d3bfb
Merge origin/qwen3.5_clean into rl_verl_qwen35
sufubao c05838e
fix: lazy-initialize SHM name constants to avoid import-time crash
sufubao 243c6a0
fix: revert weight slicing and rmsnorm precision regressions
sufubao 711e30c
fix
sufubao 7734c21
feat: add Qwen3Next linear attention model support
sufubao c757b06
refactor: simplify mamba buffer copy and integrate Triton kernels
sufubao 340d11c
fix conv3d
sufubao a6a2435
[draft] qwen3.5 dense
sufubao 054035d
split dense and moe
sufubao 01b112a
feat: add mamba_cache_ratio for automatic memory allocation
sufubao 174757d
refactor: simplify mamba_cache_ratio to direct percentage
sufubao dd2516e
add H100 config
sufubao 326ae22
refactor: align radix_cache_class with infer_state_class style
sufubao e996cd2
fix: add missing attention_chunk param to flashattention_nopad.py
sufubao 5e5cdbe
refactor: clarify naming in mamba_buffer_copy
sufubao 9cf783c
clean
sufubao e120edb
fix
sufubao f3330cf
clean
sufubao d030a67
split
sufubao e1f6129
style: apply black formatting to mamba_buffer_copy
sufubao 74f82d1
perf: add autotune configs for mamba_buffer_copy/fork kernels on H200
sufubao c1ea769
refactor: rename buffer copy methods for clarity
sufubao b81baaa
clean the code
sufubao 52b422a
vlm tokenizer support token list
shihaobai aa442a4
fix
shihaobai 0fd0202
clean code
sufubao eed0a9c
qwen35 qkv improve
shihaobai b9a386e
code simplify
shihaobai 86f17b6
clean code
shihaobai a1849e6
fix
shihaobai 61f74ac
remove contiguous
shihaobai bf0f254
remove gemma rms norm config
shihaobai 76782c2
clean code
sufubao fdd2052
add get_radix_class
sufubao 733e851
fix acc of mamba cache
shihaobai b1f8233
fix acc of mamba cache
shihaobai 90120b0
fix warmup
shihaobai 4ef6091
merge main
shihaobai 13edba2
simplify the qwen3next layer_infer
shihaobai ec499ce
openai api simplify
shihaobai 3c8597d
simplify mem manager
shihaobai 20edcc1
slime code
shihaobai eed9863
remove mtp of base_backend
shihaobai 90df4f1
slime mode_backend
shihaobai 3b832af
merge qwen3.5 and main
shihaobai 91edf3b
fix invalid memory of release_memory
shihaobai 711667a
flush_cache for hybrid cache
shihaobai b181c0a
fix rpyc
shihaobai c6a6dda
fix: node is None
sufubao ee3a7d5
fix resume invalid memory
shihaobai 32d795d
fix reqs queue
shihaobai a0937a9
fix
shihaobai 1de0e53
fix
shihaobai 2dbd2f7
pop weight after load
shihaobai 33bbfda
async update weight
shihaobai 6017484
model.norm.weight: add 1 during runtime
shihaobai b98f6d7
fix r3
sufubao c0cebba
fix qwen35 nrom
sufubao 5f4fa78
Revert "fix qwen35 nrom"
sufubao 64506b3
fix
shihaobai 9da13c1
Merge branch 'qwen3.5_clean' of https://github.com/ModelTC/lightllm i…
shihaobai 73b10ca
remove unused log
shihaobai 1d46601
Merge branch 'rl_verl_qwen35' of https://github.com/ModelTC/lightllm …
shihaobai b2ab0bf
Merge remote-tracking branch 'origin/main' into qwen3.5_clean
sufubao a93509f
fix mamba_len
shihaobai 1115543
fix
sufubao d562b7b
fix
sufubao 8f11f08
fix and remove unused code
shihaobai 709075a
Merge branch 'qwen3.5_clean' of https://github.com/ModelTC/lightllm i…
shihaobai 267412d
fix format
shihaobai f7bee08
gatermsnorm weight and mamba profile_size
shihaobai b85b6ca
simpliy code
shihaobai 3965845
update tp param
shihaobai ef41d77
fix: restore tool_calls arguments JSON string to dict conversion
sufubao 7d0458f
fix: restore tool_calls arguments JSON string to dict conversion
sufubao 8f1212a
fix build_prompt too
sufubao 77bfcba
fix
sufubao 3585432
fix buffer idx
shihaobai 334e3c4
fix
shihaobai 2f34bac
merge the update of qwen3.5_clean
shihaobai 0974ba9
fix
shihaobai fe91aa3
add instance_id with improved robustness and code quality
sufubao f4a0cb7
fix: occasional accuracy drop in rollout
shihaobai f4caa8f
reset req manager
shihaobai 8794f43
fix typo
shihaobai 1abf95a
add fp8 rl for qwen35
shihaobai 901bd13
fix abort
shihaobai 8de8baf
add logs for detoken
shihaobai 2dc39fa
fix decode overflow
shihaobai 8c20369
fix bytes decode
shihaobai 6cd300c
merge main
shihaobai 1f466c7
remove neo
shihaobai 9e54f20
remove unused code
shihaobai 46d2ee2
remove unused code
shihaobai f2c1a3e
remove unused code
shihaobai e7c1475
slime code
shihaobai 1ecf015
slim code
shihaobai a93dcb6
slime code
shihaobai ccc8832
slime radix cache
shihaobai 11ea37a
slime radixcache
shihaobai f446e5b
slim code
shihaobai 998020a
remove unused code
shihaobai 5a745e5
fix
shihaobai 90ed556
lazy init cache dir
shihaobai eb42e5b
fix linear flush_cache
shihaobai ccf7d91
feat: new rl update weight support
yqyao 3e1ac50
fix abort request
shihaobai 24028a1
memory tag normalizew
shihaobai 5f5d263
fix
shihaobai 81bf00a
Merge remote-tracking branch 'origin/main' into rl_verl_rebase_main
shihaobai 8461bd8
fix
shihaobai 641ef6e
rpyc rl
shihaobai d26e822
fix
shihaobai a6b5cbc
fix
shihaobai b48eb36
fix visual server infer name
shihaobai cbec63c
list kv cache
shihaobai 9d64e33
fix
shihaobai ffdd879
add test
shihaobai 31d3584
R3 auto dtype
shihaobai 5812070
fix
shihaobai 72e0bc5
Merge remote-tracking branch 'origin/main' into rl_verl_rebase_main
shihaobai e00721f
mm_slicer: assert weight ndim in {2, 3} and fix shape[1] residue
shihaobai 47e313f
revert detokenization/manager.py to main; add weight verify_load; twe…
shihaobai 7e1b737
inline _handle_token_output back into handle_loop
shihaobai 542254b
revert multi_level_kv_cache/manager.py to main
shihaobai 2f99b73
remove MTP buffer copy-back in dp_backend impl
shihaobai b737d2a
multinode tp: sync abort via router broadcast, drop httpserver zmq fo…
shihaobai a79fdd7
fix abort && R3 overlap
shihaobai 2399900
fix
shihaobai 361b548
sync abort
shihaobai df7963b
remove verify load
shihaobai 7c94bbd
fix multimodal encode
shihaobai 031a225
fix
shihaobai 591b9f7
add reject_all for pause_generation
shihaobai b0e2ec7
fix pause generation
shihaobai 2ca3e47
fix pause generation
shihaobai a5c0074
fix pause
shihaobai e336d82
bugfix: fix auto ipc handle
yqyao File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -7,3 +7,4 @@ dist | |
| .vscode | ||
| tmp/ | ||
| requirements-musa.txt | ||
| CLAUDE.md | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
resume_allmethod is missing a call toself.req_manager.resume(), which is present inresume_kv_cache. Without this, the request manager might not be properly re-initialized after a memory resume operation.