Skip to content

fix(xlsx/pivot): pad rowItems subtotal entries per ECMA-376 spec#47

Open
BubbleWolf wants to merge 186 commits into
iOfficeAI:mainfrom
BubbleWolf:fix/pivot-rowItems-spec-padding
Open

fix(xlsx/pivot): pad rowItems subtotal entries per ECMA-376 spec#47
BubbleWolf wants to merge 186 commits into
iOfficeAI:mainfrom
BubbleWolf:fix/pivot-rowItems-spec-padding

Conversation

@BubbleWolf

Copy link
Copy Markdown

Summary

  • Fix BuildMultiRowItems (N=2 row fields) and BuildTreeAxisItems (N≥3 row fields) to emit the correct number of <x> children per subtotal <i> element
  • ECMA-376 §18.10.1.44 requires exactly fieldCount - r children, but subtotal entries only emitted 1 regardless of depth
  • Pad with the "default" item index for each deeper field so Excel can correctly rebuild the row hierarchy on manual refresh

Root Cause

Subtotal rows represent an aggregate across all children. The code emitted only the current level's value as a single <x> child, but the spec requires values for all deeper fields too (using the "default" item index — the last entry in each pivotField's items list). Without padding, Excel's pivot refresh engine cannot reconstruct the row hierarchy and shows an empty data area.

Test plan

  • Generate a pivot with 2 row fields (e.g. rows=Region,Country) and verify subtotal rows render correctly in Excel after manual refresh
  • Generate a pivot with ≥3 row fields (e.g. rows=Region,Country,Category,Product) and verify the same
  • Verify column items are unaffected (col subtotals use a separate code path)

shuff57 and others added 30 commits April 4, 2026 10:18
* feat: extend LaTeX→OMML converter for math/stats education use cases

Add support for LaTeX constructs commonly used in math and statistics
education that were previously falling through to literal text output:

**New commands:**
- \boxed{} — bordered equation box (m:borderBox)
- \underbrace{}_{} — underbrace with label (m:groupChr + m:limLow)
- \overbrace{}^{} — overbrace with label (m:groupChr + m:limUp)
- \color{name}{} / \textcolor{} — colored equation runs (w:color rPr)
- \pmod{} — parenthesized modular arithmetic
- \bmod — binary mod operator
- \arcsin, \arccos, \arctan, \arccot, \arcsec, \arccsc — arc-trig functions
- \operatorname{} — custom upright operator names with limit support

**Improved \cancel:**
- Changed from Unicode combining overlay hack to proper m:borderBox
  with m:strikeH/m:strikeBLTR for \cancel, \bcancel, \xcancel

**New environments:**
- \begin{align}, \begin{aligned}, \begin{gathered}, \begin{split},
  \begin{eqnarray} — multi-line aligned equations via m:matrix
- \begin{array} — array environment with column spec skipping

**New delimiter support:**
- \langle / \rangle — angle brackets (⟨⟩) in symbol map
- \left\langle ... \right\rangle — proper OMML delimiters
- \lceil/\rceil, \lfloor/\rfloor — ceiling/floor brackets
- \lvert/\rvert, \lVert/\rVert — vertical bars

**New symbols:**
- \emptyset, \setminus, \complement, \cap, \cup — set notation
- \, \; \! — math spacing commands

**Color support:**
- NamedColorToHex helper mapping 20+ named colors (red, blue, etc.)

Tested against 25 LaTeX constructs used in AP Statistics, Calculus,
and Algebra courses. All previously broken constructs now generate
valid OMML that renders natively in Word and OnlyOffice.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: correct StrikeBLTR → StrikeBottomLeftToTopRight for OpenXML SDK

The OpenXML SDK uses full names (StrikeBottomLeftToTopRight) not
abbreviations (StrikeBLTR). Fixed the \cancel/\bcancel/\xcancel
implementation to use the correct class name.

Verified: builds clean with dotnet 11 preview, all 10 previously
broken LaTeX constructs now generate valid OMML, officecli validate
passes on the output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add math extensions demo PPTX

11-slide presentation built with the patched officecli binary,
demonstrating before/after for all 10 fixed LaTeX→OMML constructs:

- \boxed, \underbrace, \color, \cancel, \pmod
- \arctan, \left\langle...\right\rangle
- \begin{align}, \begin{gathered}, \operatorname

Final slide shows real AP Statistics formulas (confidence interval,
chi-squared, normal PDF, Pearson r) rendered natively as OMML.

Validates clean: officecli validate → 0 errors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GenerateParaId() was using Guid which could produce values >= 0x80000000,
causing schema validation failures. Changed to Random.Shared.Next(0, int.MaxValue)
which guarantees values in the valid range [0, 0x7FFFFFFE].
…constructs

Fix 8 issues in PR iOfficeAI#37's LaTeX extensions:
- \cancelto: consume both arguments instead of leaving second in token stream
- \cancel/bcancel/xcancel: hide border box borders, fix diagonal directions
- \color/\textcolor: preserve math structure instead of flattening to text
- \pmod: include all child nodes instead of only first
- \operatorname: support both subscript and superscript limits
- \begin{array}: strip unwanted delimiter wrapper
- Fix align/gathered comment to match implementation

Add OMML→LaTeX conversion for HTML preview:
- borderBox → \boxed{} or \cancel{}
- groupChr → \underbrace{} or \overbrace{}
- w:color in math runs → \textcolor{#hex}{}
- standalone m:m → \begin{matrix}...\end{matrix}
…L preview

- Add AppendW14CssEffects() to convert w14 namespace effects to CSS:
  textFill gradient → background linear-gradient + background-clip:text,
  glow → multi-layer text-shadow, shadow → text-shadow with offset,
  textOutline → -webkit-text-stroke, solidFill → color override
- Add reflection rendering via flipped duplicate paragraph block with
  CSS mask-image gradient for fade-out effect
- Fix MergeRunProperties() to carry over w14 namespace children during
  style chain resolution (textFill/glow/reflection were silently dropped)
- Fix OOXML→CSS gradient angle conversion: cssAngle = oomxlAngle + 90
GetParagraphText() only collected text from direct Run children,
missing text inside Hyperlink elements.
…ation

- Add @id=/@paraid=/@name= stable path addressing for PPT shapes, Word paragraphs, comments, footnotes, endnotes, and content controls
- Refactor Add method signature across all handlers for consistency
- Enhance Word/PPT navigation with improved node building and query capabilities
- Update SKILL.md documentation with stable ID examples and new add syntax
…d and PowerPoint

- Add `set find=` to format or replace matched text with auto run splitting
- Support regex via r"..." prefix (e.g. find=r"\d+%")
- Unify find+replace (replaces old scope-based FindAndReplace)
- Add `--after find:X` / `--before find:X` for positional element insertion
- Word: inline (run) and block (table/paragraph) insertion with auto paragraph splitting
- PowerPoint: inline run insertion at text positions
- Support all run-level format properties through find pathway
- Update SKILL.md, wiki, and skill docs with new syntax
w14:paraId/textId attributes require mc:Ignorable declaration
to prevent Word 2007 from rejecting the document.
- Quote paths in Quick Start examples to prevent zsh glob expansion
- Clarify find= vs plain set semantic difference
- Document find= edge cases (no match, cross-run matching)
- Add stable ID usage guidance for multi-step workflows
- Warn about shape[1] being title placeholder in Common Pitfalls
- Excel: reject find without replace early with clear error message
- Word/PPT: suggest correct anchor path format for bare @paraid=/@id= usage
- SKILL.md: add case-sensitive note, Excel find limitation, notes get
  limitation, shell bracket quoting pitfall, scope clarification,
  find= prop format warning
- Add after/before fields to BatchItem for anchor-based insertion
- Support find: text anchors, @paraid=, @id= paths in batch
- Forward after/before to resident server requests
- Word/PPT: accept regex=true in props to enable regex mode for find
- Avoids JSON double-quote escaping hell with r"..." in batch/MCP
- Excel: reject regex prop with clear error (not supported)
- Update SKILL.md with regex=true examples for CLI and batch
Remove confusing r"..." prefix syntax from documentation, use
regex=true prop consistently across CLI and batch examples
Word and PPT add commands now accept regex=true in props to enable
regex mode for find: text anchors, avoiding r"..." syntax in JSON
StreamReader/StreamWriter deadlock on Windows named pipes under .NET 11
preview — the managed stream wrapper's internal buffering stalls reads
even when bytes are available on the wire.

Changes:
- ResidentServer: replace StreamReader.ReadLineAsync/StreamWriter with
  raw byte I/O helpers (ReadLineFromPipeAsync/WriteLineToPipeAsync)
- ResidentClient: replace StreamReader/StreamWriter with raw byte I/O
  helpers (PipeReadLine/PipeWriteLine)
- CommandBuilder (open): on Windows, run resident server in-process via
  Task.Run with ManualResetEventSlim readiness signal instead of forking
  a child process (which also deadlocked on single-file host). Linux/macOS
  keeps the original Process.Start fork behavior.

Raw byte I/O is used on all platforms for the pipe protocol to avoid
divergent code paths — it is a strict subset of what StreamReader/
StreamWriter does and equally correct everywhere.
- Add swap command to batch (uses path + to fields)
- Add after/before positioning for move in batch
- Word move: resolve after/before anchors before element removal
- PPT slide move: support after/before for slide reordering
When --to is omitted but --after/--before contains a full path,
automatically extract parent path. Enables cross-slide shape move
with just --after, no redundant --to needed.
- add --json now returns data field with full node (path, text, format)
- set --json now returns data field with updated node state
- find operations include matched count in message and JSON matched field
- Eliminates need for follow-up get calls after add/set
add/set don't need to return node data — agent already knows what it
sent (add) or just did a get before (set). Only find matched count is
genuinely new information the agent can't predict.
- Fix all PPT Add* methods to use InsertAtPosition instead of AppendChild
- Add --after/--before options to CLI move command with mutual exclusivity
- Add IsTruthySafe for lenient boolean parsing (regex=invalid → false)
- Validate anchor paths in PPT ResolveAnchorPosition for out-of-bounds
- Improve error message for find: with non-paragraph parent
- Excel find+replace now returns matched count in JSON output
- Reject moving placeholder shapes across slides (prevents duplicate IDs)
goworm and others added 29 commits April 9, 2026 05:52
Add-time consumption of the sibling showdataas= / aggregate= properties
mirrored the Set path so users can write values=Sales showdataas=percent_of_row
and have it take effect at creation, not only on a follow-up Set. The
override list is still positional and validated via ParseShowDataAs so
unknown tokens fail fast (CONSISTENCY(strict-enums)).
ParseValueFields used to silently drop any numeric field index outside
headers.Length, producing a confusing empty pivot. Now throws
ArgumentException with the valid range so typos such as values=100 on
a two-column source fail fast at Add/Set time.
string.IsNullOrEmpty let names like ' ', '\t', '\t\n' slip through
straight into PivotTableDefinition.Name. Switched to
IsNullOrWhiteSpace + Trim and added an explicit throw when the user
supplied a whitespace-only name so the mistake surfaces at Add time
instead of producing a pivot with an invisible identifier.
Names such as 'Pivot\0Table' or 'Pivot\rTable' previously made it
into PivotTableDefinition.Name and produced invalid XML on save /
ambiguous identifiers on re-open. Explicit check for ASCII control
characters (0x00-0x1F, 0x7F) now throws ArgumentException at Add time.
Source specs such as ' Sheet1 ! A1:B4 ' used to fail sheet lookup
because the raw split halves were passed through untrimmed. Now the
whole spec is Trim()-ed once and each half of the '!' split gets its
own Trim() so incidental paste-from-docs whitespace no longer breaks
pivot creation.
Source specs with the [workbook.xlsx]Sheet form previously surfaced as
'Source sheet not found: [workbook.xlsx]Sheet1', wrongly implying the
user mistyped a sheet name. The feature is simply not supported —
throw ArgumentException with that explanation so the user can correct
to a local sheet reference.
The set --help output listed pivottable but advertised only name/style
as writable properties. Expanded the writable set to match what the
Set handler actually consumes (rows, cols, values, filters, aggregate,
showDataAs, style, sort, grandTotals, name) and added a dedicated
PivotTable prop reference block.
The Get readback emits dataField{N} as '{displayName}:{func}:{fieldIdx}'
where displayName is e.g. 'Sum of Sales' and the third slot is the
cacheField index. Feeding this string straight back into Set values=...
previously threw 'field Sum of Sales not found' because ParseValueFields
only knew the '{fieldName}:{func}[:showAs]' input shape.

ParseValueFields now strips known English aggregate display prefixes
(Sum/Count/Average/Max/Min/Product/Count Numbers/StdDev/StdDevp/Var/
Varp of) from the first slot, and when that prefix is present treats
a numeric third slot as a cacheField index instead of a showAs token.
The disambiguation is gated on the prefix so the existing
'Sales:sum:42' invalid-showDataAs throw contract is preserved.
…acy *Fields keys

Normalize pivot property keys (both Add and Set paths) through a single
alias table so users can write row=Cat, col=Cat, filter=Cat, value=Sales
or the Round 3 legacy canonical rowFields=Cat, colFields=Cat instead of
having those keys silently dropped. Previously only 'rows'/'cols'/'filters'
/'values' bound, with every singular or legacy spelling producing an
empty pivot that looked like the source data was wrong.

Aliases covered (all case-insensitive):
  row/rowField/rowFields              -> rows
  col/column/columns/colField/
  colFields/columnField/columnFields  -> cols
  filter/filterField/filterFields     -> filters
  value/valueField/valueFields        -> values
  columnGrandTotals                   -> colGrandTotals

Unknown keys (typos, non-ASCII) pass through verbatim so the Set path's
existing unsupported-list return channel keeps echoing the user's original
spelling.
Previously, unknown pivot property keys on the Add path (e.g. non-ASCII
'源', '行名', or English typos like 'rowname') were silently dropped —
CreatePivotTable only consumed known keys and ignored the rest,
producing an empty-looking pivot with no diagnostic.

Now every Add call runs CollectUnknownPivotKeys against the canonical
_knownPivotKeys set and emits an 'UNSUPPORTED props:' stderr warning
carrying the user's ORIGINAL spelling, matching the format already
used by CommandBuilder.FormatUnsupported so OutputFormatter and
ResidentServer both tag it as unsupported_property in JSON envelopes.

Set path is unaffected: its default switch case already returns
unknown keys through the existing unsupported list, and normalization
preserves the original spelling for that channel.
…s are in range

When a Set call narrows the source range below an existing row/col/value/filter
field's cacheField index, RefreshPivotCacheFromSource now throws ArgumentException
with a message pointing at the axis and field that went out of range. Previously
the stale index was silently carried into RebuildFieldAreas and
RenderPivotIntoSheet crashed with ArgumentOutOfRangeException on columnData[idx].

Axes that the same Set call explicitly re-specifies are skipped from validation
so 'set source=... values=NewCol' still works in one shot.
RebuildFieldAreas' field-area dedup block removed freshly-claimed fields
from the two 'other' axes but never from valueFields. 'set filters=Sales'
against a pivot with Sales as a value field left Sales in both DataFields
and PageFields, producing a corrupt duplicate assignment. Mirror the same
rule for rows/cols/values too, so any claim on one axis evicts the field
from every other axis it currently sits on.
ReadPivotTableProperties previously emitted 'location' (the output
range) but never 'source' (the input range feeding the cache). Now
round-trips the cache definition's WorksheetSource.Sheet + Reference
into the canonical 'Sheet1!A1:C3' form so the output of Get can be
fed straight back to Set source=... without translation.
…mmetry)

Get already emits dataField{N}.showAs as a structured round-trip key,
but Set rejected the same key as unsupported. Users copying output from
Get into a Set call had to translate the key back into the global
'showDataAs=' form or the inline 'values=Name:func:token' form. Now
Set routes dataField{N}.showAs=<token> through the same showdataas
positional override the existing sibling key uses, preserving the
RebuildFieldAreas apply path. Throws ArgumentException when N exceeds
the current data field count.
Set aggregate=count on a pivot with 'Sum of Sales' left the DataField
Name unchanged, so the rendered header still read 'Sum of Sales'
despite the subtotal func being Count. RebuildFieldAreas now rewrites
the display name to '<AggDisplay> of <sourceHeader>' whenever the
aggregate override actually changes func AND the current name still
matches the canonical auto-generated shape. User-provided names (any
name that does not end in ' of <sourceHeader>' with a known display
prefix) are left untouched so future explicit-name features don't get
clobbered.
Sheet-level Get (/SheetN) was listing rows and charts but omitting
pivot tables. GetSheetChildNodes now appends a pivottable[N] child node
for each PivotTablePart on the WorksheetPart, consistent with how chart
children are enumerated.
…ke Add

Extracted a shared ValidatePivotName helper from CreatePivotTable and
wired it into SetPivotTableProperties. Previously, Set accepted empty
strings and whitespace-only names without any error, bypassing the
R8-4/R8-5 guards that existed only in the Add path.
BuildTreeAxisItems (N>=3) and BuildMultiRowItems (N=2) emitted fewer
<x> children than required for subtotal <i> elements. ECMA-376
§18.10.1.44 requires exactly (fieldCount - r) children per entry.

Pad subtotal entries with the "default" item index for each deeper
field so Excel can correctly rebuild the row hierarchy when the user
manually refreshes the pivot table.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants