Skip to content

Add Custom tag variant for arbitrary role-mapped tags#336

Open
dcalvo wants to merge 2 commits into
LaurenzV:mainfrom
dcalvo:custom-tags
Open

Add Custom tag variant for arbitrary role-mapped tags#336
dcalvo wants to merge 2 commits into
LaurenzV:mainfrom
dcalvo:custom-tags

Conversation

@dcalvo
Copy link
Copy Markdown
Contributor

@dcalvo dcalvo commented Feb 20, 2026

We're using krilla to reconstruct tagged PDFs from parsed input. Real-world PDFs frequently contain application-specific structure tags. Word emits InlineShape, PowerPoint emits Slide and Textbox, InDesign emits Story, etc. These are registered in the source PDF's /RoleMap mapping to standard roles.

Without a way to represent arbitrary tags, these all collapse to NonStruct and the original tag names are lost.

This adds a TagKind::Custom(CustomTag) variant that takes an arbitrary name and a standard StructRole it maps to. krilla writes the custom name as /S and registers it in the /RoleMap (PDF 1.7) or namespace role map (PDF 2.0).

Usage:

use krilla::tagging::{Tag, StructRole};

// PowerPoint "Slide" tag, mapped to NonStruct
let slide = Tag::custom("Slide", StructRole::NonStruct);

// Word "InlineShape" tag, mapped to Figure
let shape = Tag::custom("InlineShape", StructRole::Figure)
    .with_alt_text(Some("Chart showing Q1 results".into()));

Also re-exports StructRole from krilla::tagging so consumers don't need a direct pdf-writer dependency.

@laurmaedje
Copy link
Copy Markdown
Collaborator

Also re-exports StructRole from krilla::tagging so consumers don't need a direct pdf-writer dependency.

I'm not sure it's 100% the case right now but in principle we tried to keep pdf-writer types out of the public API so that a pdf writer major bump is not a breaking change in krilla.

@dcalvo
Copy link
Copy Markdown
Contributor Author

dcalvo commented Feb 20, 2026

That makes sense. I've replaced the pub use re-export with a krilla-native StandardRole enum that mirrors the PDF 1.7 variants, with an internal From<StandardRole> for StructRole conversion. The public API no longer exposes any pdf-writer types.

@LaurenzV
Copy link
Copy Markdown
Owner

I haven't looked into it yet, would there be a way of reusing our existing codegen infrastructure for this, or is it not possible? In any case, what we do really need is to ensure that the entry of each rolemap maps to a tag that actually exists in the current PDF version. If not, we should probably error out? Or just fallback to P? Not sure what the best option is. 🤔

@saecki
Copy link
Copy Markdown
Collaborator

saecki commented Feb 21, 2026

I haven't looked into it yet, would there be a way of reusing our existing codegen infrastructure for this, or is it not possible? In any case, what we do really need is to ensure that the entry of each rolemap maps to a tag that actually exists in the current PDF version. If not, we should probably error out? Or just fallback to P? Not sure what the best option is. 🤔

Currently AnyTagonly allows setting global attributes, so this would be somewhat restricting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants