Skip to content
This repository was archived by the owner on Feb 19, 2021. It is now read-only.

Don't parse dates with more than 4 digits for the year#556

Open
heinrich5991 wants to merge 2 commits intothe-paperless-project:masterfrom
heinrich5991:pr_dates
Open

Don't parse dates with more than 4 digits for the year#556
heinrich5991 wants to merge 2 commits intothe-paperless-project:masterfrom
heinrich5991:pr_dates

Conversation

@heinrich5991
Copy link
Copy Markdown

The regex was broken before, using (?!…) instead of (?<=…).

The regex was broken before, using `(?!…)` instead of `(?<=…)`.
skius
skius previously approved these changes Aug 19, 2019
@MasterofJOKers
Copy link
Copy Markdown
Contributor

Why do we need to lookbehind and lookahead? Can't we get away with something like this?

r = re.compile(r'(?:\b|[_-])('\
               r'(?:[0-9]{1,2}[./-][0-9]{1,2}[./-](?:[0-9]{4}|[0-9]{2}))|'\
               r'(?:(?:[0-9]{4}|[0-9]{2})[./-][0-9]{1,2}[./-][0-9]{1,2})|'\
               r'(?:[0-9]{1,2}\. +[^\W\d_]{3,9} (?:[0-9]{4}|[0-9]{2}))|'\
               r'(?:[^\W\d_]{3,9}(?: [0-9]{1,2},)? [0-9]{4})'\
               r')(?:\b|[_-])')

In some manual testing, it seems to match everything matched in the unit tests. We can then use m.group(1) for the date-part of the matched string.

👍 for the additional tests.

@heinrich5991
Copy link
Copy Markdown
Author

Updated with the suggestion to not use lookahead/lookbehind.

Comment thread src/documents/parsers.py Outdated
@heinrich5991
Copy link
Copy Markdown
Author

Removed all the superfluous (?:).

@MasterofJOKers
Copy link
Copy Markdown
Contributor

Removed all the superfluous (?:).

Great, could you also remove the superfluous \ in the [], while you're at it?

MasterofJOKers
MasterofJOKers previously approved these changes Nov 2, 2019
@heinrich5991
Copy link
Copy Markdown
Author

Removed the superfluous backslashes in the regex.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants