Skip to content

change(web): rework traversalless prediction, add mild whitespace-correction 🚂#15920

Draft
jahorton wants to merge 3 commits into
change/web/abstract-whitespace-tokenization-mappingfrom
change/web/rework-traversalless-prediction
Draft

change(web): rework traversalless prediction, add mild whitespace-correction 🚂#15920
jahorton wants to merge 3 commits into
change/web/abstract-whitespace-tokenization-mappingfrom
change/web/rework-traversalless-prediction

Conversation

@jahorton
Copy link
Copy Markdown
Contributor

@jahorton jahorton commented May 6, 2026

🚧

Build-bot: skip build:web
Test-bot: skip

@keymanapp-test-bot
Copy link
Copy Markdown

keymanapp-test-bot Bot commented May 6, 2026

User Test Results

Test specification and instructions

User tests are not required

Test Artifacts

  • Web
    • KeymanWeb Test Home - build : all tests passed (no artifacts on BuildLevel "build")

@keymanapp-test-bot keymanapp-test-bot Bot changed the title change(web): rework traversalless prediction, add mild whitespace-correction change(web): rework traversalless prediction, add mild whitespace-correction 🚂 May 6, 2026
@keymanapp-test-bot keymanapp-test-bot Bot added this to the A19S28 milestone May 6, 2026
@github-actions github-actions Bot added web/ web/predictive-text/ change Minor change in functionality, but not new labels May 6, 2026
@jahorton jahorton force-pushed the change/web/abstract-whitespace-tokenization-mapping branch from 7539b3e to 6c1170d Compare May 7, 2026 18:22
@jahorton jahorton force-pushed the change/web/rework-traversalless-prediction branch from f8a1e49 to daea6e5 Compare May 7, 2026 18:22
@keyman-server keyman-server modified the milestones: A19S28, A19S29 May 11, 2026
// ContextTokenization pattern due to the model's lack of LexiconTraversal
// support, though.

const tokenizedCorrection = mapWhitespacedTokenization(tokenization.left.map((t) => { return {exampleInput: t.text} }), lexicalModel, correction.sample).tokenizedTransform;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-note: crossreference with changes in #15851 - there's a lot of overlap here. Might be best to DRY it out, especially since we may not actually want to stick with the "empty context" solution.

@jahorton jahorton force-pushed the change/web/abstract-whitespace-tokenization-mapping branch from da7359e to a800a12 Compare May 15, 2026 18:30
@jahorton jahorton force-pushed the change/web/rework-traversalless-prediction branch from dacfc13 to 8227b81 Compare May 15, 2026 18:34
});

const predictions = predictFromCorrections(model, correctionDistribution, context);
const predictions = correctAndEnumerateWithoutTraversals(model, correctionDistribution, context);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: heavy cleanup needed here - we probably need tests for both predictFromCorrections AND for correctAndEnumerateWithoutTraversals.

const predictions: CorrectionPredictionTuple[] = tailPredictions.map((p) => {
// Concat corrections + predictions for their components.
const predictionSequence = [...predictionPrefixSequence, p];
const fullPrediction: ProbabilityMass<Suggestion> = predictionSequence.reduce((prev, curr) => {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #15907 - this is somewhat similar to the compositeIntermediatePredictions function defined therein. May be worth hoisting, then polishing up later in the PR chain?

To better handle inputs that shift the word-boundary in some custom models and models released before Keyman 14.0, this PR provides generalized re-use of the whitespace-based token-transition algorithm used for our most prominently-supported models.

Build-bot: skip build:web
Test-bot: skip
@jahorton jahorton force-pushed the change/web/abstract-whitespace-tokenization-mapping branch from a800a12 to c12cef9 Compare May 18, 2026 18:24
@jahorton jahorton force-pushed the change/web/rework-traversalless-prediction branch from f377066 to 95f3df3 Compare May 18, 2026 19:13
@jahorton jahorton force-pushed the change/web/abstract-whitespace-tokenization-mapping branch from c12cef9 to 5f4d1ee Compare May 19, 2026 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change Minor change in functionality, but not new epic-autocorrect web/predictive-text/ web/

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

2 participants