Ability to use multiple validators#352
Conversation
441b1a5 to
b9ed113
Compare
|
For the veraPDF test failure, see veraPDF/veraPDF-library#1588 |
b9ed113 to
85959ec
Compare
|
Thanks! As it happens, I was thinking about taking this task on myself. I want to spend some more time thinking about whether the validators can be represented a bit more elegantly, I might either push to this PR or create a new one with the tests you added! |
|
Not sure if you meant using a bitset for the validators, but I think that would be nice :) |
|
I considered eagerly collecting all validators into a shared policy object to a) only iterate them once and b) encapsulate how their policies fold with regards to any/all and negations. However, since in practice the validator vector will have at most 2 items, the iterations are not too bad. Having the policy folding in one place would be nice but I eschewed that in pursuit of a minimal change (or laziness). I don't know how useful a bit field would be since (at least with how policies are expressed now) ORing them does not yield a meaningful result. |
|
I updated veraPDF in CI, so hopefully it should work now? |
The bitset would only be used internally to have a cheaply copyable type that can hold all standards. Though the proposed design in #355 from @LaurenzV would probably be nicer for this too. |
85959ec to
4521d5b
Compare
|
I will attempt to build a |
|
I have updated the PR with a new design merging the three approaches, with a heavy dose of #355. See the commit messages for a more in-depth design rationale. |
saecki
left a comment
There was a problem hiding this comment.
Nice, I think having the offending validators will save a lot of people a whole lot of frustration.
And sorry for the all the back and forth.
Just some final nitpicks from my side
This strategy is inspired by LaurenzV#355 and its reviews, https://gist.github.com/saecki/c37ed6c6e07c07e79d87a93aca78ef07 and https://gist.github.com/reknih/c54e9400ecaed250ef14e4c633c62422. It eschews a vector of `Validator` in favor of a struct with one validator per standard (Like LaurenzV#355). It also centralizes all policy decisions involving only validators in the `Validators` struct (like the unified flags approach) and decisions involving both the version and the validators on `SerializeSettings`. In order to minimize API churn when new validators are added, `Configuration` is now created through a `ConfigurationBuilder`. New standards will still result in a breaking change, since the Archival and Ua enums rely on being exhaustive, but as long as the consumer does not actually match them, they will not have to update their uses of Krilla. I consolidated the prohibition calculation in a tuple match to make it more compact and maintainable. I also enabled PDF 2.0 files without no validator to use Associated Files and removed self-contained XMP schemata for extension schemata from PDF/A-4.
`embedded_file_pdf_20`: PDF 2.0 allows the `AF` and `AFRelationship` keys. Before, it was only written for PDF/A-3 files. Reference `SerializeSettings::supports_associated_files`. `validate_pdf_a4_f_with_embedded_file`, `validate_pdf_a4_full_example`, `validate_pdf_a4e_full_example`, `validate_pdf_a4f_full_example`: PDF/A-4 does not actually define the `pdfaSchema`, `pdfaProperty`, and `pdfaType` properties. Instead, ISO 19005-4 clause 6.7.2.3 posits that the XMP metadata should be described using RELAX NG (ISO 16684-2). We don't support that, but since it's only a recommendation, it's fine. `validate_multi_validator_embedded_file_no_af` has been changed and renamed in order to reflect the correct understanding that PDF/A-3 amends the file with the `/AF` and `/AFRelationship` keys, even if other standards are in use, too.
This PR adds the ability to combine validators, in particular, it will enable the sought-after combination PDF/A-2a and PDF/UA-1.
Closes #101