Index PDF pages individually for page-level search results #86

Merged

qwc merged 1 commit from feature/pdf-search-page-jump into main

2026-02-20 08:31:20 +01:00

qwc commented

2026-02-20 08:23:10 +01:00

Owner

Summary

PDF text is now extracted per page (splitting on \f for pdftotext, iterating pages for Go fallback) instead of as a single concatenated blob
Each PDF page is indexed as a separate search document with a page_number field
Search results for PDFs include the page number and link to the PDF viewer with #page=N fragment
The PDF viewer reads the hash fragment and passes it to the embedded PDF for direct page jump
A search hint banner appears when arriving from search, prompting Ctrl+F to find the exact term
All three search UIs updated: full search page, overlay dropdown, navbar dropdown

Reindex required

Existing search indexes won't have per-page documents. A Rebuild Search Index from Admin > Projects is needed after deploying. Old indexes degrade gracefully (page_number returns 0, no page jump).

Closes #79

Test plan

Upload a multi-page PDF, rebuild search index
Search for text on page 3+ — result shows "Page N" and links to #page=N
Verify the PDF viewer opens at the correct page
Verify HTML search results still work with ?highlight=
Verify overlay and navbar search handle both PDF and HTML results
Build and tests pass

🤖 Generated with Claude Code

## Summary - PDF text is now extracted per page (splitting on `\f` for pdftotext, iterating pages for Go fallback) instead of as a single concatenated blob - Each PDF page is indexed as a separate search document with a `page_number` field - Search results for PDFs include the page number and link to the PDF viewer with `#page=N` fragment - The PDF viewer reads the hash fragment and passes it to the embedded PDF for direct page jump - A search hint banner appears when arriving from search, prompting Ctrl+F to find the exact term - All three search UIs updated: full search page, overlay dropdown, navbar dropdown ## Reindex required Existing search indexes won't have per-page documents. A **Rebuild Search Index** from Admin > Projects is needed after deploying. Old indexes degrade gracefully (page_number returns 0, no page jump). Closes #79 ## Test plan - [ ] Upload a multi-page PDF, rebuild search index - [ ] Search for text on page 3+ — result shows "Page N" and links to `#page=N` - [ ] Verify the PDF viewer opens at the correct page - [ ] Verify HTML search results still work with `?highlight=` - [ ] Verify overlay and navbar search handle both PDF and HTML results - [ ] Build and tests pass 🤖 Generated with [Claude Code](https://claude.ai/code)

qwc added 1 commit

2026-02-20 08:23:10 +01:00

Index PDF pages individually for page-level search results

CI / build (pull_request) Successful in 40s

Details

CI / docker (pull_request) Has been skipped

Details

CI / test (pull_request) Successful in 54s

Details

c5503a147d

PDF text is now extracted per page instead of as a single blob.
Search results for PDFs include the page number and link directly
to the matching page using #page=N fragments. The PDF viewer
reads the fragment and passes it to the embedded PDF, and shows
a search hint banner when arriving from a search result.

Requires a search index rebuild after deploying.

Closes #79

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

qwc merged commit b1bd483a40 into main

2026-02-20 08:31:20 +01:00

qwc deleted branch feature/pdf-search-page-jump

2026-02-20 08:31:20 +01:00

qwc referenced this pull request from a commit

2026-02-20 08:31:21 +01:00

Merge pull request 'Index PDF pages individually for page-level search results' (#86) from feature/pdf-search-page-jump into main

qwc referenced this pull request from a commit

2026-07-13 13:19:10 +02:00

Merge pull request 'Index PDF pages individually for page-level search results' (#86) from feature/pdf-search-page-jump into main