Skip to content

Fix LinguaTagger ImportError referring to langdetect instead of lingua#288

Open
Chessing234 wants to merge 1 commit intoallenai:mainfrom
Chessing234:fix/lingua-tagger-import-error-message
Open

Fix LinguaTagger ImportError referring to langdetect instead of lingua#288
Chessing234 wants to merge 1 commit intoallenai:mainfrom
Chessing234:fix/lingua-tagger-import-error-message

Conversation

@Chessing234
Copy link
Copy Markdown

Bug

LinguaTagger.__init__ checks LINGUA_AVAILABLE but raises ImportError("langdetect is not installed, ..."), pointing users to the wrong package.

https://github.com/allenai/dolma/blob/96f8b07/python/dolma/taggers/language.py#L254-L257

def __init__(self) -> None:
    super().__init__()
    if not LINGUA_AVAILABLE:
        raise ImportError("langdetect is not installed, please run `pip install dolma[lang]`.")
    self.detector = LanguageDetectorBuilder.from_languages(*Language.all()).build()

Root cause

Copy-paste from LangdetectTagger.__init__ at line 204-206, which correctly reports langdetect when LANGDETECT_AVAILABLE is false. The message in LinguaTagger was never updated for the new availability flag.

Fix

Change the message to "lingua is not installed, ..." so it names the actually-missing package. The pip install dolma[lang] hint is unchanged because the lang extras include lingua-language-detector.

LinguaTagger checks LINGUA_AVAILABLE but its ImportError message said
'langdetect is not installed' (copy-paste from LangdetectTagger). The
install hint 'pip install dolma[lang]' is correct because the lang extras
include lingua-language-detector, but the package name in the message
misleads users. Update the package name to 'lingua' so the error
accurately describes the missing dependency.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant