fix(xlsx/pivot): pad rowItems subtotal entries per ECMA-376 spec#47
Open
BubbleWolf wants to merge 186 commits into
Open
fix(xlsx/pivot): pad rowItems subtotal entries per ECMA-376 spec#47BubbleWolf wants to merge 186 commits into
BubbleWolf wants to merge 186 commits into
Conversation
* feat: extend LaTeX→OMML converter for math/stats education use cases
Add support for LaTeX constructs commonly used in math and statistics
education that were previously falling through to literal text output:
**New commands:**
- \boxed{} — bordered equation box (m:borderBox)
- \underbrace{}_{} — underbrace with label (m:groupChr + m:limLow)
- \overbrace{}^{} — overbrace with label (m:groupChr + m:limUp)
- \color{name}{} / \textcolor{} — colored equation runs (w:color rPr)
- \pmod{} — parenthesized modular arithmetic
- \bmod — binary mod operator
- \arcsin, \arccos, \arctan, \arccot, \arcsec, \arccsc — arc-trig functions
- \operatorname{} — custom upright operator names with limit support
**Improved \cancel:**
- Changed from Unicode combining overlay hack to proper m:borderBox
with m:strikeH/m:strikeBLTR for \cancel, \bcancel, \xcancel
**New environments:**
- \begin{align}, \begin{aligned}, \begin{gathered}, \begin{split},
\begin{eqnarray} — multi-line aligned equations via m:matrix
- \begin{array} — array environment with column spec skipping
**New delimiter support:**
- \langle / \rangle — angle brackets (⟨⟩) in symbol map
- \left\langle ... \right\rangle — proper OMML delimiters
- \lceil/\rceil, \lfloor/\rfloor — ceiling/floor brackets
- \lvert/\rvert, \lVert/\rVert — vertical bars
**New symbols:**
- \emptyset, \setminus, \complement, \cap, \cup — set notation
- \, \; \! — math spacing commands
**Color support:**
- NamedColorToHex helper mapping 20+ named colors (red, blue, etc.)
Tested against 25 LaTeX constructs used in AP Statistics, Calculus,
and Algebra courses. All previously broken constructs now generate
valid OMML that renders natively in Word and OnlyOffice.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: correct StrikeBLTR → StrikeBottomLeftToTopRight for OpenXML SDK
The OpenXML SDK uses full names (StrikeBottomLeftToTopRight) not
abbreviations (StrikeBLTR). Fixed the \cancel/\bcancel/\xcancel
implementation to use the correct class name.
Verified: builds clean with dotnet 11 preview, all 10 previously
broken LaTeX constructs now generate valid OMML, officecli validate
passes on the output.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add math extensions demo PPTX
11-slide presentation built with the patched officecli binary,
demonstrating before/after for all 10 fixed LaTeX→OMML constructs:
- \boxed, \underbrace, \color, \cancel, \pmod
- \arctan, \left\langle...\right\rangle
- \begin{align}, \begin{gathered}, \operatorname
Final slide shows real AP Statistics formulas (confidence interval,
chi-squared, normal PDF, Pearson r) rendered natively as OMML.
Validates clean: officecli validate → 0 errors.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GenerateParaId() was using Guid which could produce values >= 0x80000000, causing schema validation failures. Changed to Random.Shared.Next(0, int.MaxValue) which guarantees values in the valid range [0, 0x7FFFFFFE].
…constructs Fix 8 issues in PR iOfficeAI#37's LaTeX extensions: - \cancelto: consume both arguments instead of leaving second in token stream - \cancel/bcancel/xcancel: hide border box borders, fix diagonal directions - \color/\textcolor: preserve math structure instead of flattening to text - \pmod: include all child nodes instead of only first - \operatorname: support both subscript and superscript limits - \begin{array}: strip unwanted delimiter wrapper - Fix align/gathered comment to match implementation Add OMML→LaTeX conversion for HTML preview: - borderBox → \boxed{} or \cancel{} - groupChr → \underbrace{} or \overbrace{} - w:color in math runs → \textcolor{#hex}{} - standalone m:m → \begin{matrix}...\end{matrix}
…L preview - Add AppendW14CssEffects() to convert w14 namespace effects to CSS: textFill gradient → background linear-gradient + background-clip:text, glow → multi-layer text-shadow, shadow → text-shadow with offset, textOutline → -webkit-text-stroke, solidFill → color override - Add reflection rendering via flipped duplicate paragraph block with CSS mask-image gradient for fade-out effect - Fix MergeRunProperties() to carry over w14 namespace children during style chain resolution (textFill/glow/reflection were silently dropped) - Fix OOXML→CSS gradient angle conversion: cssAngle = oomxlAngle + 90
GetParagraphText() only collected text from direct Run children, missing text inside Hyperlink elements.
…ation - Add @id=/@paraid=/@name= stable path addressing for PPT shapes, Word paragraphs, comments, footnotes, endnotes, and content controls - Refactor Add method signature across all handlers for consistency - Enhance Word/PPT navigation with improved node building and query capabilities - Update SKILL.md documentation with stable ID examples and new add syntax
…d and PowerPoint - Add `set find=` to format or replace matched text with auto run splitting - Support regex via r"..." prefix (e.g. find=r"\d+%") - Unify find+replace (replaces old scope-based FindAndReplace) - Add `--after find:X` / `--before find:X` for positional element insertion - Word: inline (run) and block (table/paragraph) insertion with auto paragraph splitting - PowerPoint: inline run insertion at text positions - Support all run-level format properties through find pathway - Update SKILL.md, wiki, and skill docs with new syntax
w14:paraId/textId attributes require mc:Ignorable declaration to prevent Word 2007 from rejecting the document.
- Quote paths in Quick Start examples to prevent zsh glob expansion - Clarify find= vs plain set semantic difference - Document find= edge cases (no match, cross-run matching) - Add stable ID usage guidance for multi-step workflows - Warn about shape[1] being title placeholder in Common Pitfalls
- Excel: reject find without replace early with clear error message - Word/PPT: suggest correct anchor path format for bare @paraid=/@id= usage - SKILL.md: add case-sensitive note, Excel find limitation, notes get limitation, shell bracket quoting pitfall, scope clarification, find= prop format warning
- Word/PPT: accept regex=true in props to enable regex mode for find - Avoids JSON double-quote escaping hell with r"..." in batch/MCP - Excel: reject regex prop with clear error (not supported) - Update SKILL.md with regex=true examples for CLI and batch
Remove confusing r"..." prefix syntax from documentation, use regex=true prop consistently across CLI and batch examples
Word and PPT add commands now accept regex=true in props to enable regex mode for find: text anchors, avoiding r"..." syntax in JSON
StreamReader/StreamWriter deadlock on Windows named pipes under .NET 11 preview — the managed stream wrapper's internal buffering stalls reads even when bytes are available on the wire. Changes: - ResidentServer: replace StreamReader.ReadLineAsync/StreamWriter with raw byte I/O helpers (ReadLineFromPipeAsync/WriteLineToPipeAsync) - ResidentClient: replace StreamReader/StreamWriter with raw byte I/O helpers (PipeReadLine/PipeWriteLine) - CommandBuilder (open): on Windows, run resident server in-process via Task.Run with ManualResetEventSlim readiness signal instead of forking a child process (which also deadlocked on single-file host). Linux/macOS keeps the original Process.Start fork behavior. Raw byte I/O is used on all platforms for the pipe protocol to avoid divergent code paths — it is a strict subset of what StreamReader/ StreamWriter does and equally correct everywhere.
- Add swap command to batch (uses path + to fields) - Add after/before positioning for move in batch - Word move: resolve after/before anchors before element removal - PPT slide move: support after/before for slide reordering
When --to is omitted but --after/--before contains a full path, automatically extract parent path. Enables cross-slide shape move with just --after, no redundant --to needed.
- add --json now returns data field with full node (path, text, format) - set --json now returns data field with updated node state - find operations include matched count in message and JSON matched field - Eliminates need for follow-up get calls after add/set
add/set don't need to return node data — agent already knows what it sent (add) or just did a get before (set). Only find matched count is genuinely new information the agent can't predict.
- Fix all PPT Add* methods to use InsertAtPosition instead of AppendChild - Add --after/--before options to CLI move command with mutual exclusivity - Add IsTruthySafe for lenient boolean parsing (regex=invalid → false) - Validate anchor paths in PPT ResolveAnchorPosition for out-of-bounds - Improve error message for find: with non-paragraph parent
- Excel find+replace now returns matched count in JSON output - Reject moving placeholder shapes across slides (prevents duplicate IDs)
Add-time consumption of the sibling showdataas= / aggregate= properties mirrored the Set path so users can write values=Sales showdataas=percent_of_row and have it take effect at creation, not only on a follow-up Set. The override list is still positional and validated via ParseShowDataAs so unknown tokens fail fast (CONSISTENCY(strict-enums)).
ParseValueFields used to silently drop any numeric field index outside headers.Length, producing a confusing empty pivot. Now throws ArgumentException with the valid range so typos such as values=100 on a two-column source fail fast at Add/Set time.
string.IsNullOrEmpty let names like ' ', '\t', '\t\n' slip through straight into PivotTableDefinition.Name. Switched to IsNullOrWhiteSpace + Trim and added an explicit throw when the user supplied a whitespace-only name so the mistake surfaces at Add time instead of producing a pivot with an invisible identifier.
Names such as 'Pivot\0Table' or 'Pivot\rTable' previously made it into PivotTableDefinition.Name and produced invalid XML on save / ambiguous identifiers on re-open. Explicit check for ASCII control characters (0x00-0x1F, 0x7F) now throws ArgumentException at Add time.
Source specs such as ' Sheet1 ! A1:B4 ' used to fail sheet lookup because the raw split halves were passed through untrimmed. Now the whole spec is Trim()-ed once and each half of the '!' split gets its own Trim() so incidental paste-from-docs whitespace no longer breaks pivot creation.
Source specs with the [workbook.xlsx]Sheet form previously surfaced as 'Source sheet not found: [workbook.xlsx]Sheet1', wrongly implying the user mistyped a sheet name. The feature is simply not supported — throw ArgumentException with that explanation so the user can correct to a local sheet reference.
The set --help output listed pivottable but advertised only name/style as writable properties. Expanded the writable set to match what the Set handler actually consumes (rows, cols, values, filters, aggregate, showDataAs, style, sort, grandTotals, name) and added a dedicated PivotTable prop reference block.
The Get readback emits dataField{N} as '{displayName}:{func}:{fieldIdx}'
where displayName is e.g. 'Sum of Sales' and the third slot is the
cacheField index. Feeding this string straight back into Set values=...
previously threw 'field Sum of Sales not found' because ParseValueFields
only knew the '{fieldName}:{func}[:showAs]' input shape.
ParseValueFields now strips known English aggregate display prefixes
(Sum/Count/Average/Max/Min/Product/Count Numbers/StdDev/StdDevp/Var/
Varp of) from the first slot, and when that prefix is present treats
a numeric third slot as a cacheField index instead of a showAs token.
The disambiguation is gated on the prefix so the existing
'Sales:sum:42' invalid-showDataAs throw contract is preserved.
…tinel in RowFields
…acy *Fields keys Normalize pivot property keys (both Add and Set paths) through a single alias table so users can write row=Cat, col=Cat, filter=Cat, value=Sales or the Round 3 legacy canonical rowFields=Cat, colFields=Cat instead of having those keys silently dropped. Previously only 'rows'/'cols'/'filters' /'values' bound, with every singular or legacy spelling producing an empty pivot that looked like the source data was wrong. Aliases covered (all case-insensitive): row/rowField/rowFields -> rows col/column/columns/colField/ colFields/columnField/columnFields -> cols filter/filterField/filterFields -> filters value/valueField/valueFields -> values columnGrandTotals -> colGrandTotals Unknown keys (typos, non-ASCII) pass through verbatim so the Set path's existing unsupported-list return channel keeps echoing the user's original spelling.
Previously, unknown pivot property keys on the Add path (e.g. non-ASCII '源', '行名', or English typos like 'rowname') were silently dropped — CreatePivotTable only consumed known keys and ignored the rest, producing an empty-looking pivot with no diagnostic. Now every Add call runs CollectUnknownPivotKeys against the canonical _knownPivotKeys set and emits an 'UNSUPPORTED props:' stderr warning carrying the user's ORIGINAL spelling, matching the format already used by CommandBuilder.FormatUnsupported so OutputFormatter and ResidentServer both tag it as unsupported_property in JSON envelopes. Set path is unaffected: its default switch case already returns unknown keys through the existing unsupported list, and normalization preserves the original spelling for that channel.
…s are in range When a Set call narrows the source range below an existing row/col/value/filter field's cacheField index, RefreshPivotCacheFromSource now throws ArgumentException with a message pointing at the axis and field that went out of range. Previously the stale index was silently carried into RebuildFieldAreas and RenderPivotIntoSheet crashed with ArgumentOutOfRangeException on columnData[idx]. Axes that the same Set call explicitly re-specifies are skipped from validation so 'set source=... values=NewCol' still works in one shot.
RebuildFieldAreas' field-area dedup block removed freshly-claimed fields from the two 'other' axes but never from valueFields. 'set filters=Sales' against a pivot with Sales as a value field left Sales in both DataFields and PageFields, producing a corrupt duplicate assignment. Mirror the same rule for rows/cols/values too, so any claim on one axis evicts the field from every other axis it currently sits on.
ReadPivotTableProperties previously emitted 'location' (the output range) but never 'source' (the input range feeding the cache). Now round-trips the cache definition's WorksheetSource.Sheet + Reference into the canonical 'Sheet1!A1:C3' form so the output of Get can be fed straight back to Set source=... without translation.
…mmetry)
Get already emits dataField{N}.showAs as a structured round-trip key,
but Set rejected the same key as unsupported. Users copying output from
Get into a Set call had to translate the key back into the global
'showDataAs=' form or the inline 'values=Name:func:token' form. Now
Set routes dataField{N}.showAs=<token> through the same showdataas
positional override the existing sibling key uses, preserving the
RebuildFieldAreas apply path. Throws ArgumentException when N exceeds
the current data field count.
Set aggregate=count on a pivot with 'Sum of Sales' left the DataField Name unchanged, so the rendered header still read 'Sum of Sales' despite the subtotal func being Count. RebuildFieldAreas now rewrites the display name to '<AggDisplay> of <sourceHeader>' whenever the aggregate override actually changes func AND the current name still matches the canonical auto-generated shape. User-provided names (any name that does not end in ' of <sourceHeader>' with a known display prefix) are left untouched so future explicit-name features don't get clobbered.
Sheet-level Get (/SheetN) was listing rows and charts but omitting pivot tables. GetSheetChildNodes now appends a pivottable[N] child node for each PivotTablePart on the WorksheetPart, consistent with how chart children are enumerated.
…ke Add Extracted a shared ValidatePivotName helper from CreatePivotTable and wired it into SetPivotTableProperties. Previously, Set accepted empty strings and whitespace-only names without any error, bypassing the R8-4/R8-5 guards that existed only in the Add path.
BuildTreeAxisItems (N>=3) and BuildMultiRowItems (N=2) emitted fewer <x> children than required for subtotal <i> elements. ECMA-376 §18.10.1.44 requires exactly (fieldCount - r) children per entry. Pad subtotal entries with the "default" item index for each deeper field so Excel can correctly rebuild the row hierarchy when the user manually refreshes the pivot table.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
BuildMultiRowItems(N=2 row fields) andBuildTreeAxisItems(N≥3 row fields) to emit the correct number of<x>children per subtotal<i>elementfieldCount - rchildren, but subtotal entries only emitted 1 regardless of depthRoot Cause
Subtotal rows represent an aggregate across all children. The code emitted only the current level's value as a single
<x>child, but the spec requires values for all deeper fields too (using the "default" item index — the last entry in each pivotField's items list). Without padding, Excel's pivot refresh engine cannot reconstruct the row hierarchy and shows an empty data area.Test plan
rows=Region,Country) and verify subtotal rows render correctly in Excel after manual refreshrows=Region,Country,Category,Product) and verify the same