feat: Split affinity value between SAME and MUTUALLY_INTELLIGIBLE#22
Merged
ericfjosne merged 2 commits intomainfrom Jun 17, 2025
Merged
Conversation
acc1872 to
0efa4ab
Compare
shobhitt-spotify
approved these changes
Jun 17, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The current
SAME_OR_MUTUALLY_INTELLIGIBLElocale affinity value was the only option originally, as we relied on the distance between locales to derive an affinity. But this is not ideal, as havingSAMEandMUTUALLY_INTELLIGIBLEbundled together makes it impossible to differentiate between what is actually the same language and what is something the user would still understand, but from a different language.Thanks to the addition of the new spoken language matcher, in
LanguageUtils, we are now be able to split the locale affinity valueSAME_OR_MUTUALLY_INTELLIGIBLEinto:MUTUALLY_INTELLIGIBLE: Locales identify languages that are similar to a point where a person should understand both if they understand one of them.SAME: Locales identify the same language.Examples:
SAME: fr, fr-BE, fr-LU, fr-CH, fr-SE, fr-JP all identify French.MUTUALLY_INTELLIGIBLE: bs (Bosnian) and Croatian (hr).We suggest implementing this split, as part of this pull request.
Incidentally, we discovered what might actually be a bug in icu4j:
hr-BA(Croatian, Bosnia) is best matched withbsBosnian instead ofhrCroatian. We will need to validate this behavior with Unicode, but implemented a work around in the LanguageUtils class for now.