Python: Improve prompt-template msg serialize and sample usage#13738
Python: Improve prompt-template msg serialize and sample usage#13738moonbox3 wants to merge 3 commits intomicrosoft:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request improves Python prompt-template message serialization by using XML element serialization (instead of manual string concatenation) for Jinja2 and Handlebars message helpers, updates related samples to use serializer-backed helpers, and adds regression tests to ensure correct escaping and round-tripping of chat history with XML metacharacters.
Changes:
- Update Jinja2 and Handlebars
messagehelpers to build message XML viaElementTreeserialization to correctly escape XML metacharacters. - Align Azure chat prompt-template samples to use
message_to_promptfor serializer-backed message rendering. - Add unit and end to end tests covering escaping and chat history round-trip behavior, including system message preservation.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| python/semantic_kernel/prompt_template/utils/jinja2_system_helpers.py | Serialize message() output via XML element serialization to escape metacharacters. |
| python/semantic_kernel/prompt_template/utils/handlebars_system_helpers.py | Serialize message block helper output via XML element serialization to escape metacharacters. |
| python/tests/unit/prompt_template/test_jinja2_prompt_template.py | Add regression test validating escaping and round-trip parsing for message() helper. |
| python/tests/unit/prompt_template/test_jinja2_prompt_template_e2e.py | Add end to end tests for round-trip with metacharacters and system role preservation. |
| python/tests/unit/prompt_template/test_handlebars_prompt_template.py | Add regression test validating escaping and round-trip parsing for message block helper. |
| python/tests/unit/prompt_template/test_handlebars_prompt_template_e2e.py | Add end to end tests for round-trip with metacharacters and system role preservation. |
| python/samples/concepts/prompt_templates/azure_chat_gpt_api_jinja2.py | Update sample template to use message_to_prompt(item) for serializer-backed rendering. |
| python/samples/concepts/prompt_templates/azure_chat_gpt_api_handlebars.py | Update sample template to use {{message_to_prompt}} for serializer-backed rendering. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
python/tests/unit/prompt_template/test_jinja2_prompt_template_e2e.py
Outdated
Show resolved
Hide resolved
python/tests/unit/prompt_template/test_handlebars_prompt_template_e2e.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Automated Code Review
Reviewers: 3 | Confidence: 89%
✓ Security Reliability
This PR is a security improvement that replaces manual f-string XML construction with xml.etree.ElementTree for generating XML tags in both Handlebars and Jinja2 prompt template helpers. Previously, user content containing XML metacharacters (<, >, &, ") was injected verbatim into XML strings, which could corrupt the XML structure and cause incorrect message reconstruction via ChatHistory.from_rendered_prompt(). The fix uses Element/tostring for proper escaping, consistent with how the rest of the codebase (ChatMessageContent.to_element, ChatHistory.from_rendered_prompt) already handles XML. The use of xml.etree.ElementTree for serialization-only (not parsing) is safe and matches the existing nosec B405 pattern in 12+ other files; defusedxml is correctly used elsewhere for parsing. The sample files are updated to demonstrate the message_to_prompt helper. Test coverage is thorough, including round-trip tests that verify ChatHistory.from_rendered_prompt correctly recovers the original messages after escaping.
✓ Test Coverage
This PR switches the handlebars and jinja2
_messagehelpers from manual XML string concatenation toxml.etree.ElementTree, properly escaping XML metacharacters in message content and attribute values. The test additions are well-structured: they verify specific escape sequences (<,&, unescaped"in text), and critically verify round-trip fidelity viaChatHistory.from_rendered_prompt. Both template engines (handlebars and jinja2) have symmetric unit and e2e tests. Minor gaps: no test coversNone/empty content in_messagehelpers (the jinja2 helper previously renderedNoneas the literal string "None" — now fixed but untested), and no test exercises attribute-value escaping in the handlebars_messageblock helper kwargs path. These are non-blocking.
✓ Design Approach
The PR correctly fixes an XML injection bug in both the Handlebars and Jinja2
_messagehelpers by switching from manual string concatenation toxml.etree.ElementTree.Element+tostring(), which properly escapes XML metacharacters in content. The approach is sound and the# nosec B405annotations are appropriate (we are generating XML, not parsing untrusted input). The sample changes steering users towardmessage_to_promptare a good idea since it usesto_prompt()/to_element()for full-fidelity serialization. However, the jinja2_message(item)helper still only serializesitem.content(the first TextContent text), silently dropping image items, function-call items, and any other non-text items from the message. Its sibling_message_to_promptavoids this by delegating toitem.to_prompt()/to_element(). Since_message(item)accepts a fullChatMessageContent, calers reasonably expect full serialization — yet the implementation silently produces an incomplete representation. The better approach for_messagein jinja2 would be to callitem.to_element()and usetostring()on the resulting element, exactly asto_prompt()does, rather than manually setting only the text field.
Suggestions
- The Jinja2
_message(item)helper setsmessage.text = item.content, which only captures the first TextContent's text and silently drops non-text items (images, function calls, etc.). Additionally,item.contentcould be a non-string type, andElementTree.tostringbehavior with non-strings is implementation-defined. The better approach is to delegate toitem.to_element()and calltostring()on that—the same way_message_to_promptworks viaitem.to_prompt()—which would avoid data loss for multi-modal messages and eliminate the type-safety concern. - Add a test for
None/empty content in the Jinja2_messagehelper—the old code renderedNoneas the literal string "None", and the new code correctly produces an empty element, but this fix is untested. A round-trip test with aChatMessageContentthat has no text items (e.g., a FunctionCallContent-only message) through{{ message(item) }}would lock this in. - Consider adding a test for attribute values containing XML metacharacters in the Handlebars
_messageblock helper (e.g.,{{#message role=role custom_attr=value}}...{{/message}}wherevaluecontains<or"). TheElement.set()call correctly escapes these, but there's no test verifying it. - The Handlebars sample (
azure_chat_gpt_api_handlebars.py) switched from{{#message role=role}}{{~content~}}{{/message}}to{{message_to_prompt}}, which changes the XML serialization format (from bare text to<text>child elements). While both formats round-trip correctly, this sample change is orthogonal to the XML-escaping fix and could be confusing to users who reference the old sample pattern. - The Handlebars
_messagehelper now setsmessage.text = str(options['fn'](this)), treating the entire block output as plain text. For users withallow_dangerously_set_content=Truewho previously relied on embedding raw XML sub-structure (e.g., image or function-call tags) directly in the block, this is a silent behavior change. It would be worth documenting that_messagenow always produces plain-text message content, and thatmessage_to_promptshould be used for structured/multi-modal content.
Automated review by moonbox3's agents
- Use public re-export for AuthorRole (from semantic_kernel.contents) in test_jinja2_prompt_template_e2e.py and test_handlebars_prompt_template_e2e.py - Replace manual XML construction in _message() with item.to_element() to properly serialize all content items (images, function calls, etc.) - Remove unused Enum and Element imports from jinja2_system_helpers.py - Update test assertion to match correct to_element() output format Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Motivation and Context
This PR updates the Jinja2 and Handlebars prompt-template helpers to serialize chat messages through the existing XML/message serializer instead of assembling message XML manually. It also aligns the prompt-template samples with the serializer-backed helper and adds regression coverage for common message content.
Description
Prompt template message serialize improvements.
Contribution Checklist