Skip to content

Complete artifact types#273

Merged
LaurenzV merged 3 commits into
LaurenzV:mainfrom
reknih:artifact-types
May 14, 2026
Merged

Complete artifact types#273
LaurenzV merged 3 commits into
LaurenzV:mainfrom
reknih:artifact-types

Conversation

@reknih
Copy link
Copy Markdown
Collaborator

@reknih reknih commented Oct 7, 2025

This commit adds the PDF 1.7 artifact type Background, the PDF 1.7 pagination artifact subtype Watermark, and the PDF 2.0 pagination artifact subtypes PageNum, LineNum, Redaction, Bates. Finally, the variant PaginationOther allows writing a pagination artifact without a subtype.

Because these new variants are not compatible with PDF 1.4, where the mechanism was introduced, a compatibility layer has been introduced.

@LaurenzV
Copy link
Copy Markdown
Owner

LaurenzV commented Oct 7, 2025

Maybe we could again add a small test case that uses a subtype?

@saecki
Copy link
Copy Markdown
Collaborator

saecki commented Oct 7, 2025

Background artifacts require a BBox in PDF 1.7. So I think there should be a way of specifying that as well.

@reknih
Copy link
Copy Markdown
Collaborator Author

reknih commented Oct 7, 2025

Both done. I was unsure whether the rectangle need to be transformed in any way.

@saecki
Copy link
Copy Markdown
Collaborator

saecki commented Oct 7, 2025

Yes it needs to be transformed, should be the same as here:

let transform = page_root_transform(page_info.size().height());

@reknih
Copy link
Copy Markdown
Collaborator Author

reknih commented Oct 7, 2025

I'm afraid that page_info is not accessible in ContentTag::write_properties, only all page infos from the SerializeContext...

Anyways, this PR does not block Typst 0.14

@saecki
Copy link
Copy Markdown
Collaborator

saecki commented Oct 7, 2025

Oh, maybe it doesn't need any transform, since the page root transform and other transforms are already applied to the surface?

@saecki
Copy link
Copy Markdown
Collaborator

saecki commented May 4, 2026

I'm pretty sure it doesn't need to be transformed, but since it uses the same Rect type as the bbox attribute we should probably document that the Rect in the tag bbox is specified within the page transform, while the bbox for attributes is specified within the current Surface transform.

@reknih reknih force-pushed the artifact-types branch 3 times, most recently from 43e29f3 to e25ee37 Compare May 6, 2026 14:31
@reknih
Copy link
Copy Markdown
Collaborator Author

reknih commented May 6, 2026

But since it uses the same Rect type as the bbox attribute we should probably document that the Rect in the tag bbox is specified within the page transform.

Done in 3fb3d28

@saecki saecki self-requested a review May 8, 2026 13:50
Copy link
Copy Markdown
Owner

@LaurenzV LaurenzV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, but we will have to do some more research on the whole bbox thing and then figure out how to best document how the bbox should be determined.

Unfortunately I don't have Acrobat Pro, so if someone else has it might be useful to test how Acrobat interprets the bbox attributes, especially if there are CTM operations in the main page stream.

>>
stream
/Artifact <<
/BBox [0 80 200 130]
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really the correct bbox? If I understood correctly, we need to assume the default user space (i.e. where in PDF Coordinates it will actually end up being). Since the text is written at the top but the PDF coordinate system is y-down, shouldn't the y be something like 600 or 700 instead?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find a way to verify this using Acrobat or other tools, but I agree.

Comment on lines +159 to +160
/// structure elements, this property list entry is specified with regards
/// to the page transform, not the [`Surface`](crate::surface::Surface)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure about this? Does that mean you think it should only take the y-flip into account? I don't see why this would be the case, given the description in the spec.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you're right. I guess both the comment here and for the artifact could briefly state that the Rect is specified within the page transform.

/// The bounding box of a tag that encloses its visible content.
/// If the content spans multiple pages, this should be omitted.
///
/// Unlike the property list entry `BBox` on [artifacts](crate::tagging::Artifact),
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this bbox different? As far as I can tell, in the tagging section of the spec it has the same description where it says in default user space units", so I don't see why it's different.

Image

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I don't know how I got there. I've read through the spec again and that indeed doesn't make any sense.

Regarding the implementation, we could get the page transform from the root_transform of the Surface::root_builder in Surface::start_tagged and pass it through ContentBuilder::start_marked_content_with_properties.

Another idea I had, though that would be a larger change, would be to have a SurfaceKind with a Page variant that contains the page dimensions.
In the future we could also add a mutable borrow of the page annotations, so annotations could be added through the Surface which would come in handy when constructing a TagTree.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries! The only problem is that I've tried my best to abstract away the y-flip transform needed for PDF so that the user can always just assume a y-down coordinate system, it would be unfortunate if the user somehow has to take it into account here. Therefore, I think the best thing to do is to simply document that the bbox needs to cover all shapes in user space, but still allow the user to assume that user space is a y-down coorindate system. Then, internally we apply the y-flip transform so that it's converted to y-up, but this is hidden away from the user. WDYT?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, the Y-flip should definitely stay hidden away, that's what I meant to say :)
That would require having access the page height to compute the transform right?
The implementation ideas were just two options I could think of to achieve that.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, but I’m pretty sure this should already exist somewhere in the code? I think we have something similar for annotation rects.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but the difference is that those are written directly into the content stream… so yeah if there is an easy way to wire the page height through, that’s what we should do. I don’t currently remember if the automatic page size might cause problems here…

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to specify an automatic page size? Seems to me like the page size must be specified when creating a page 🤔
The surface also already uses the page height from the page settings when creating the root content builder, so that would match the behavior

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I must remember wrong! I’m sure there was this functionality in the past but I might have removed it already. 🤔

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I must have missed that they are both defined in user-space units. Under this assumption, here is the new BBox rectangle from the test ImageMagick'ed into there. Neither Acrobat nor PAC 2026 were able to display the BBox.

tagging_artifact_subtypes_tight

reknih added 3 commits May 11, 2026 10:47
This commit adds the PDF 1.7 artifact type `Background`, the PDF 1.7 pagination artifact subtype `Watermark`,  and the PDF 2.0 pagination artifact subtypes `PageNum`, `LineNum`, `Redaction`, `Bates`. Finally, the variant `PaginationOther` allows writing a pagination artifact without a subtype.

Because these new variants are not compatible with PDF 1.4, where the mechanism was introduced, a compatibility layer has been introduced.
@LaurenzV
Copy link
Copy Markdown
Owner

LGTM, thanks!

@LaurenzV LaurenzV merged commit 71386a3 into LaurenzV:main May 14, 2026
6 checks passed
tag.write_properties(sc, properties);
// Page height extracted from transform and passed to function to allow
// its dependants to flip the y-axis, mirroring Krilla conventions.
let page_height = self.root_transform.ty();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really correct? This function will be called on the last sub_builder, which doesn't necessarily have the page root transform right?
I think to be correct we should use the root_transform of the root_builder inside Surface::start_tagged.
Also instead of extracting the page_height we could just pass through the entire transform.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants