-
Notifications
You must be signed in to change notification settings - Fork 189
Pull requests: allenai/dolma
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Rename TokenizerConfig.__post__init__ to __post_init__
#292
opened Apr 20, 2026 by
Chessing234
Loading…
Fix get_nl_ratio ZeroDivisionError on empty document text
#291
opened Apr 20, 2026 by
Chessing234
Loading…
Fix find_offset treating exact start-boundary as 'not found'
#290
opened Apr 20, 2026 by
Chessing234
Loading…
Fix BaseUrlTagger.IGNORE_IP_REGEX_START missing f-string prefix
#289
opened Apr 19, 2026 by
Chessing234
Loading…
Fix LinguaTagger ImportError referring to langdetect instead of lingua
#288
opened Apr 17, 2026 by
Chessing234
Loading…
Fix ft_dataset write_results dropping messages and opening output in binary mode
#287
opened Apr 17, 2026 by
Chessing234
Loading…
Fix BaseBucketApi.add incrementing _total per call instead of per value
#286
opened Apr 15, 2026 by
Chessing234
Loading…
Fix whitespace_tokenizer_v1 counting tokens + 1 via regex.split
#285
opened Apr 15, 2026 by
Chessing234
Loading…
Fix catastrophic backtracking in not_alphanum_paragraph_v1 regex
#284
opened Apr 13, 2026 by
Chessing234
Loading…
Fix off-by-one in CodeCopyrightTagger._score span length
#283
opened Apr 12, 2026 by
Chessing234
Loading…
1 of 2 tasks
Fix off-by-one in CodeCopyrightTagger end position (missing newline)
#282
opened Apr 11, 2026 by
Chessing234
Loading…
Fix wrong denominator in fraction_of_characters_in_duplicate_lines
#281
opened Apr 10, 2026 by
Chessing234
Loading…
Fix IGNORE_IP_REGEX_START never matching localhost IPs
#280
opened Apr 9, 2026 by
Chessing234
Loading…
4 tasks done
Fix typos and clean up leftover draft text in docs
#279
opened Apr 7, 2026 by
Chessing234
Loading…
2 tasks
Fix catastrophic regex backtracking in NotAlphanumParagraphV1 tagger
#278
opened Apr 6, 2026 by
Chessing234
Loading…
3 tasks
docs: clarify temporal deduplication strategies and document types (Issue #267)
#276
opened Nov 27, 2025 by
ada-ggf25
Loading…
[WIP DO NOT MERGE] Learn2Code Feature Branch
#233
opened Feb 13, 2025 by
cmwilhelm
Contributor
Loading…
Bump openssl from 0.10.66 to 0.10.70 in the cargo group
dependencies
Pull requests that update a dependency file
rust
Pull requests that update Rust code
#228
opened Feb 3, 2025 by
dependabot
bot
Loading…
Fixed
ignore_existing flag not working as expected.
#224
opened Jan 1, 2025 by
soldni
Contributor
Loading…
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.