Fix HTML diff for Sphinx/PyData theme docs #56

Merged
qwc merged 1 commit from fix/htmldiff-sphinx-pydata into main 2026-02-10 14:01:41 +01:00
Owner

Summary

  • Fix tag comparison: Compare tags by name + open/close type only, ignoring attributes — Sphinx generates different id/class between versions, causing the LCS to find almost no common subsequence
  • Expand block tag list: Add 19 missing HTML5 block elements (section, article, nav, dl, dt, dd, etc.) and use O(1) object lookup instead of array scan
  • SVG awareness: Track SVG nesting depth in wrapDiff(), emit SVG elements atomically without <ins>/<del> wrapping (fixes scattered SVG icons)
  • Performance guard: Skip O(n^2) diff for documents exceeding 15k tokens, show warning banner instead
  • Pre-diff sanitization: Strip nav, .breadcrumb, .headerlink, .anchor-link, script, style before diffing
  • Error handling: Try-catch around diff with fallback if output is suspiciously short
  • Sidecar: Fix range 3for i := 0; i < 3; i++ in handler_test.go (Go 1.21 compat)

Test plan

  • go build succeeds
  • go test ./... all pass
  • Upload two versions of PyData Sphinx docs, trigger diff — should show text changes highlighted, not blank screen
  • Verify MkDocs/RTD docs still diff correctly (no regression)
  • Verify large docs show performance warning instead of hanging

🤖 Generated with Claude Code

## Summary - **Fix tag comparison**: Compare tags by name + open/close type only, ignoring attributes — Sphinx generates different `id`/`class` between versions, causing the LCS to find almost no common subsequence - **Expand block tag list**: Add 19 missing HTML5 block elements (`section`, `article`, `nav`, `dl`, `dt`, `dd`, etc.) and use O(1) object lookup instead of array scan - **SVG awareness**: Track SVG nesting depth in `wrapDiff()`, emit SVG elements atomically without `<ins>`/`<del>` wrapping (fixes scattered SVG icons) - **Performance guard**: Skip O(n^2) diff for documents exceeding 15k tokens, show warning banner instead - **Pre-diff sanitization**: Strip `nav`, `.breadcrumb`, `.headerlink`, `.anchor-link`, `script`, `style` before diffing - **Error handling**: Try-catch around diff with fallback if output is suspiciously short - **Sidecar**: Fix `range 3` → `for i := 0; i < 3; i++` in handler_test.go (Go 1.21 compat) ## Test plan - [x] `go build` succeeds - [x] `go test ./...` all pass - [ ] Upload two versions of PyData Sphinx docs, trigger diff — should show text changes highlighted, not blank screen - [ ] Verify MkDocs/RTD docs still diff correctly (no regression) - [ ] Verify large docs show performance warning instead of hanging 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Fix HTML diff for Sphinx/PyData theme docs
All checks were successful
CI / test (pull_request) Successful in 1m11s
CI / build (pull_request) Successful in 56s
CI / docker (pull_request) Has been skipped
6a89eee38b
The diff algorithm produced broken/blank output for Sphinx-generated
docs because tag comparison included attributes (which differ between
versions), the block tag list was incomplete, and SVG elements got
wrapped in ins/del producing invalid HTML.

- Compare tags by name and open/close type only, ignoring attributes
- Expand block tag list with section, article, nav, dl, dt, dd, etc.
- Add SVG awareness: emit SVG elements atomically without ins/del
- Add performance guard for documents exceeding 15k tokens
- Add pre-diff sanitization to strip nav, breadcrumbs, headerlinks
- Add try-catch with fallback for diff failures
- Fix Go 1.21 range integer syntax in handler_test.go

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
qwc merged commit d747f972a3 into main 2026-02-10 14:01:41 +01:00
qwc deleted branch fix/htmldiff-sphinx-pydata 2026-02-10 14:01:41 +01:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
qwc-open/asiakirjat!56
No description provided.