explorer: search as a real global filter (#234 Step 4) by rdhyee · Pull Request #251 · isamplesorg/isamplesorg.github.io

rdhyee · 2026-05-31T05:34:48Z

A1: search as a real global filter (#234 Step 4)

Makes a committed free-text search a global filter across every explorer surface, instead of an optional side-panel lookup. On search, buildSearchFilter() materializes a non-temp DuckDB table search_pids (one ILIKE scan over sample_facets_v2), and every surface then constrains via a cheap semi-join AND pid IN (SELECT pid FROM search_pids):

Samples table filters to the search ("N of M in this map view").
Globe enters point mode and renders only the matching dots (clusters can't be text-filtered).
Facet legend counts and stats scope to search ∩ viewport.

Fixes the incoherence reported in #247 (table claimed unrelated viewport samples "match the current filters" during a search; interim honesty fix shipped as #250, this completes it).

What's verified

Globe coherence — bucchero → 2,693 pids, point mode, h3 cleared, samplePointsLen ≤ total. Confirmed headed + headless and on a live deploy (real HTTP/2 + 206 ranges, prod data).
Search perf — collapsed an A1 double-scan (pid-set build + side-panel results both scanned the 63 MB facets); the side panel now reads the materialized search_pids, so it's one facets scan per search, matching pre-A1. CI smoke gate (pottery) passes.
Filter coherence — facet legend counts now use the same padded viewport (VIEWPORT_PAD_FACTOR) as the table/heatmap/point-loader/stat, so legend == "N match" (was reading low: ~166 vs ~481 at a wide view).
Production-clean — all A1 debug instrumentation (a1dbg/__a1log/__a1globe + on-page panel) is gated behind ?debug=a1; default load has a clean global namespace and no overhead.

Included commits

data_base dev-override fix · debug gating + dev-probe removal · double-scan collapse · facet-padding coherence fix · dev verify infra (dev_server.py, tests/playwright/a1-verify.mjs + probes). The dev/test infra is optional — happy to drop dev_server.py / the probes if you'd prefer them out of upstream.

Known / deferred (not blockers, flagged for review)

Heatmap isn't search-aware yet (renderHeatmap omits searchFilterSQL), so the "filtered density" layer stays unfiltered under a committed search. Follow-up.
Selection revalidation on search change (clear a selection that's no longer in the filtered set).
Cold-search latency — A1 moves the un-indexed full-text ILIKE scan to the front of the common flow; the proper fix is the BM25 substrate (Explorer FTS Track 1b: Honesty fix for query-spec / live mismatch #168–172). The "Building search filter…" affordance masks it for now.

Relates to

#234 (umbrella), #247/#250, #248 (concept-URI search — a second producer of the same search_pids set), #249 (the "refactor explorer.qmd first?" question — this PR is a data point for it).

Staging

Deployed and verified on the rdhyee fork's GitHub Pages (same data/infra as isamples.org). Suggest squash-merge — the branch carries some WIP commits whose messages predate the fixes.

🤖 Generated with Claude Code

…table surface Strategy B: materialize search_pids (one ILIKE scan over facets_url) on a committed search, then constrain surfaces with a cheap pid semi-join. This increment (table surface, verified): - buildSearchFilter/clearSearchFilter: non-temp search_pids table (DISTINCT, NOT NULL), token-versioned _next→swap, captures match total. Published on window.__searchFilter {active,term,token,total} + window.searchFilterSQL(). - doSearch builds the filter (shows "Building search filter…") then refreshes the table; clears it on empty/short submit. - loadCount/loadPage semi-join on search_pids; summaryText → "N of M \"term\" matches in this map view" (replaces isamplesorg#250 interim copy). - Dev probe cell (a1PersistenceProbe) — REMOVE before PR. Verified on local build: bucchero → table shows only OpenContext Poggio Civitate matches (2,693), no GEOME mollusks; non-temp table persists across db.query() calls. Probe (isamplesorg#249 data): no coord-less matches, no dup pids, broad-term max ~82k. TODO (still in PR #1, NOT YET DONE): points loader, facet counts + cube gating, stats, and C3 auto-point-mode so the globe isn't left unfiltered. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…e), C3 — globe path BUGGY Adds to the working table surface: - searchIsActive()/searchFilterSQL() cell-local helpers in the viewer cell. - loadViewportSamples: semi-join on search_pids. - updateCrossFilteredCounts: semi-join on both paths; gate off the cube fast-path AND the global baseline early-return when a search is active. - applySearchFilterChange(): C3 orchestrator — force point mode on search, revert to altitude-appropriate mode on clear; refresh table+facets. - camera-changed handler: latch point mode while a search is active. - doSearch calls applySearchFilterChange after build / on clear. KNOWN BUG (needs debugging): the GLOBE points render the UNFILTERED viewport count (e.g. "5000 of 1,591,051") even though search is active and the table correctly shows 2,693. C3 does not enter point mode at high altitude on boot either (globe stays unfiltered clusters). Likely an async race between the boot point-load / mode entry and the post-build applySearchFilterChange (filter built ~40-90s into boot, after the camera has already settled). The table surface (loadCount/loadPage) IS correctly filtered. Probe cell still present (remove before PR). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ch-token staleness) + [A1dbg] logging; globe still not entering point mode — next: Codex rec #4 one-reconciler refactor Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…rver + deterministic A1 observability - R2_BASE honors ?data_base= / localStorage ISAMPLES_DATA_BASE (default prod), so the explorer can read a local parquet mirror instead of 40-90s remote range-fetches. - dev_server.py: range-capable (206) static server; stock python http.server returns 200 and breaks DuckDB-WASM partial reads. - window.__a1log/__a1state + a1dbg() + on-page panel (?debug=a1) replace flaky console capture; window.__a1globe() exposes mode/point state for a Playwright harness. - Converted [A1dbg] console.logs to a1dbg events at build/mode/point-load/discard points. NOTE: cold cost is init-dominated (DuckDB-WASM+Cesium+OJS ~40s) — mirror helps the DATA phase only; the real lever is load-once + in-page iteration. Mirror range verified (curl -r => 206) but a full end-to-end speedup run hung in init (shakedown tomorrow; check 0-byte current/wide.parquet). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…be coherence) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Working handoff docs for the search-as-global-filter (A1, isamplesorg#234 Step 4) work — branch state, the globe logjam + Codex's reconciler spec, the fast verify-loop, the performance model, and Eric isamplesorg#248 / isamplesorg#249. Strip before the A1 PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ckDB-WASM The ?data_base=/data dev override produced root-relative parquet URLs (/data/foo.parquet). DuckDB-WASM's httpfs reads those as a virtual-FS glob ("No files found that match the pattern") instead of fetching over HTTP, so the local-mirror verify loop hung in init with zero /data fetches — the "shakedown" symptom. Resolve a root-relative data_base against location.origin so the ergonomic ?data_base=/data form works; the prod default and absolute (http://...) overrides pass through unchanged. Verify-loop infra: - dev_server.py: pin HTTP/1.1 (DuckDB's range reader expects keep-alive; curl-verified 206 + multi-request keep-alive). Local full-GET-vs-206 is DuckDB-WASM heuristic and moot over localhost; validate ranges on deploy. - tests/playwright/shakedown-206.mjs: headless boot+search probe (no popup). Confirms cold boot ~2.3s to live, bucchero search builds 2,693 pids ~9s. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Diagnoses whether a committed search renders sample points and whether the result depends on camera altitude. Boots at a given alt/lat/lng, fires the bucchero search, waits for the async point load to settle, and dumps the __a1log event sequence + final __a1globe() state. Finding: with a proper wait, the globe is A1-coherent at BOTH whole-globe (9000 km → renders all 2693 pids; computeViewRectangle saturates, not null) and zoomed-in (80 km → 2670 in-view) altitudes. The earlier "0 sample points" was a measure-too-early artifact, not a bug. Suggests the C3 fixes (4e79830) work in a foreground/headless context and the summary's "globe won't enter point mode" was likely a backgrounded-tab rAF-freeze artifact. Pending headed a1-verify.mjs verdict to rule out an animation-only race. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Default stays headed (real flyTo — what A1 is verified against). A headed window that opens UNFOCUSED becomes a background tab → Chrome freezes its rAF render loop → the page hangs mid-init (the same backgrounded-tab freeze that corrupted the original logjam observations). HEADLESS=1 sidesteps that: headless pages are always "active". Use it for CI / repeated runs; keep headed for a real-animation spot check. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… dev probe Pre-deploy cleanup (the summary's "don't ship" list): - Remove the a1PersistenceProbe OJS cell — a one-time dev check that console-logged on every load and threw a Catalog Error (the design point it verified, non-temp tables persisting across DuckDBClient connections, is proven and load-bearing in production now). - Gate the whole A1 observability block (a1dbg / __a1log / __a1state / __a1globe + on-page panel) behind ?debug=a1. Production users now get a clean global namespace and zero overhead; the Playwright harness opts in via ?debug=a1. All a1dbg?.() call sites already use optional chaining, so they are no-ops when the block doesn't run. Verified: ?debug=a1 → a1-verify.mjs still ✅ COHERENT (2693 pts); no flag → __a1globe/a1dbg/__a1log undefined, no panel, no probe console output. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…rch_pids doSearch scanned the 63 MB facets parquet TWICE per committed search: once in buildSearchFilter (pid-set) and again for the side-panel results SELECT (+ a third for the real-count COUNT when the 50-cap hit). On CI's smoke gate, the broad "pottery" search blew the 90s budget (first A1 deploy failed there). Fix: buildSearchFilter now materializes the side-panel columns (label, source, place_name) and the relevance score IN THE SAME scan that builds the pid-set, so the results SELECT and the COUNT read the small in-memory search_pids table (aliased `s`) instead of re-scanning facets. One facets scan per search now, matching pre-A1. sourceFilterSQL('s.source') + the bare-pid facetFilterSQL compose unchanged; search_pids stays pid-keyed (dropped the weaker 5-col DISTINCT — pid is unique, so the build is naturally one row per pid). Verified locally (fast mirror): pottery 15.8s → 12.7s (build 6.9s + surface updates); a1-verify still ✅ COHERENT; production-clean without ?debug=a1. Note: the remaining time-to-results is buildSearchFilter + applySearchFilter (globe/facet updates); if CI's smoke still exceeds budget, render the side panel before applySearchFilterChange next. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…atch") updateCrossFilteredCounts computed facet-legend counts over the EXACT viewport (pad 0), while the samples-table COUNT, the point-mode loader, the "samples in view" stat, and the heatmap all pad by VIEWPORT_PAD_FACTOR (0.3). Matching samples in the 30% margin were counted by the table but not the legend, so the legend read low: off-by-one at a Cyprus deep-zoom (13 vs 14), and ~166 vs ~481 for material=rock at a wide Red-Sea view (RY, live rdhyee deploy). Aligns the last "in view" surface to the padded contract (isamplesorg#234 coherence). Applies the parked facet_count_padding.patch (one line + the coherence regression test) on the A1 branch, since the mismatch is live on the A1 deploy and isamplesorg#234 is exactly "make filter semantics coherent across surfaces." Verified at the reported view: facet Rock 167 → 496, now == table 496; a1-verify still ✅ COHERENT. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Two issues from Codex's PR isamplesorg#251 review: 1. search_pids staging race — buildSearchFilter used a fixed `search_pids_next` name. Two overlapping searches could interleave so a later search swapped an earlier search's rows into `search_pids` under its own term (the token checks guard the publish, not the shared staging object). Use a token-scoped staging table `search_pids_next_${token}`, dropped in finally. Also stop DROPping the live `search_pids` on clear (an in-flight reader would throw) — flip active=false and leave it unreferenced until the next search replaces it. Verified: bucchero→soil back-to-back now publishes soil/2969 (its own count), not bucchero's under soil's term. 2. heatmap search-blind — renderHeatmap omitted searchFilterSQL and heatmapFilterHash omitted the search token, so the "filtered density" overlay stayed unfiltered under a committed search. Append window.searchFilterSQL('pid') to the heatmap aggregation and add the search token to the hash so it recomputes/re-keys on search commit/clear (isamplesorg#234 cross-surface coherence). a1-verify still ✅ COHERENT. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…le rows) Round-2 of Codex's PR isamplesorg#251 review: the previous no-drop clear left the prior search's rows in search_pids, and doSearch's side-panel SELECT reads `FROM search_pids` directly (does NOT gate on __searchFilter.active) — so a build failure could render the previous term's rows under the new term. Chose Codex's empty-table alternative over the early-return built-guard: an early return before the side-panel try would skip the isamplesorg#167 telemetry `finally`, whereas CREATE OR REPLACE TABLE search_pids (...empty...) keeps both the in-flight semi-join readers and the direct side-panel reader safely seeing zero rows, and a build failure flows through the existing results.length===0 → return-in-try → finally path with telemetry intact. Only clearSearchFilter changes; no doSearch control-flow restructure. Verified: a1-verify ✅ COHERENT; bucchero→clear→soil publishes soil/2969 (own count, no stale rows); clearSearchFilter is only called on empty-submit + build-failure, not per search. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Addresses Codex's remaining nit (PR isamplesorg#251, non-blocking): since clearSearchFilter() now leaves search_pids EMPTY, a genuine build failure and a true empty result set both reach the side-panel's results.length===0 branch. A `searchFilterBuildFailed` flag (set in the build catch) makes the panel say "Search error: couldn't build the filter…" on a real failure while still flowing through the isamplesorg#167 telemetry finally — instead of the misleading "No results for {term}". a1-verify still ✅ COHERENT. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

rdhyee · 2026-05-31T20:18:10Z

Multi-AI review cycle → dual approval

Ran an iterative Codex (gpt-5.4) review/revise loop on this PR alongside Claude's own review. Codex found genuine issues across three rounds; all fixed and empirically re-verified (the a1-verify.mjs coherence harness stayed green throughout, and overlap/clear cycles were checked directly):

Round	Finding	Fix	Commit
1	Staging-table race — a fixed `search_pids_next` name let two overlapping searches cross-contaminate (a later build could swap an earlier build's pids into `search_pids` under its own term; token checks guarded the publish, not the shared staging object)	token-scoped `search_pids_next_${token}`, dropped in `finally`	`a576dea`
1	Heatmap search-blind — `renderHeatmap` omitted `searchFilterSQL` and `heatmapFilterHash` omitted the search token, so the density overlay stayed unfiltered under a committed search	append the semi-join + add the active-search token to the hash	`a576dea`
2	Stale direct-reader — leaving `search_pids` in place on clear let the side-panel `SELECT FROM search_pids` render the previous term's rows on a build failure	replace with an empty same-shape table on clear (chosen over an early-return guard so the `#167` telemetry `finally` still runs)	`8a9a1d3`
3 (nit)	a genuine build failure read as "No results"	`searchFilterBuildFailed` flag → "Search error"	`0a91361`

Verified:

a1-verify.mjs: ✅ A1 COHERENT (headed + headless + live rdhyee deploy).
Overlap: bucchero→soil back-to-back publishes soil/2969 (its own count), not bucchero's under soil's term.
Clear cycle: bucchero→clear→soil clean, no stale rows.

Verdicts: Claude — approve; Codex — APPROVE (no remaining requested changes). Deployed and verified on the rdhyee fork's GitHub Pages (same data/infra as isamples.org).

rdhyee and others added 15 commits May 29, 2026 16:12

WIP A1: Codex C3 fixes (moveEnd latch, awaitable enterPointMode, sear…

4e79830

…ch-token staleness) + [A1dbg] logging; globe still not entering point mode — next: Codex rec #4 one-reconciler refactor Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

dev: A1 Playwright verify harness (condition-based; asserts table+glo…

91a944c

…be coherence) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

rdhyee merged commit e6f9def into isamplesorg:main Jun 1, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

explorer: search as a real global filter (#234 Step 4)#251

explorer: search as a real global filter (#234 Step 4)#251
rdhyee merged 15 commits into
isamplesorg:mainfrom
rdhyee:feat/search-global-filter-a1

rdhyee commented May 31, 2026

Uh oh!

rdhyee commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rdhyee commented May 31, 2026

A1: search as a real global filter (#234 Step 4)

What's verified

Included commits

Known / deferred (not blockers, flagged for review)

Relates to

Staging

Uh oh!

rdhyee commented May 31, 2026

Multi-AI review cycle → dual approval

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant