explorer: search as a real global filter (#234 Step 4)#251
Merged
Conversation
…table surface
Strategy B: materialize search_pids (one ILIKE scan over facets_url) on a
committed search, then constrain surfaces with a cheap pid semi-join.
This increment (table surface, verified):
- buildSearchFilter/clearSearchFilter: non-temp search_pids table (DISTINCT,
NOT NULL), token-versioned _next→swap, captures match total. Published on
window.__searchFilter {active,term,token,total} + window.searchFilterSQL().
- doSearch builds the filter (shows "Building search filter…") then refreshes
the table; clears it on empty/short submit.
- loadCount/loadPage semi-join on search_pids; summaryText → "N of M
\"term\" matches in this map view" (replaces isamplesorg#250 interim copy).
- Dev probe cell (a1PersistenceProbe) — REMOVE before PR.
Verified on local build: bucchero → table shows only OpenContext Poggio
Civitate matches (2,693), no GEOME mollusks; non-temp table persists across
db.query() calls. Probe (isamplesorg#249 data): no coord-less matches, no dup pids,
broad-term max ~82k.
TODO (still in PR #1, NOT YET DONE): points loader, facet counts + cube
gating, stats, and C3 auto-point-mode so the globe isn't left unfiltered.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e), C3 — globe path BUGGY Adds to the working table surface: - searchIsActive()/searchFilterSQL() cell-local helpers in the viewer cell. - loadViewportSamples: semi-join on search_pids. - updateCrossFilteredCounts: semi-join on both paths; gate off the cube fast-path AND the global baseline early-return when a search is active. - applySearchFilterChange(): C3 orchestrator — force point mode on search, revert to altitude-appropriate mode on clear; refresh table+facets. - camera-changed handler: latch point mode while a search is active. - doSearch calls applySearchFilterChange after build / on clear. KNOWN BUG (needs debugging): the GLOBE points render the UNFILTERED viewport count (e.g. "5000 of 1,591,051") even though search is active and the table correctly shows 2,693. C3 does not enter point mode at high altitude on boot either (globe stays unfiltered clusters). Likely an async race between the boot point-load / mode entry and the post-build applySearchFilterChange (filter built ~40-90s into boot, after the camera has already settled). The table surface (loadCount/loadPage) IS correctly filtered. Probe cell still present (remove before PR). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ch-token staleness) + [A1dbg] logging; globe still not entering point mode — next: Codex rec #4 one-reconciler refactor Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rver + deterministic A1 observability - R2_BASE honors ?data_base= / localStorage ISAMPLES_DATA_BASE (default prod), so the explorer can read a local parquet mirror instead of 40-90s remote range-fetches. - dev_server.py: range-capable (206) static server; stock python http.server returns 200 and breaks DuckDB-WASM partial reads. - window.__a1log/__a1state + a1dbg() + on-page panel (?debug=a1) replace flaky console capture; window.__a1globe() exposes mode/point state for a Playwright harness. - Converted [A1dbg] console.logs to a1dbg events at build/mode/point-load/discard points. NOTE: cold cost is init-dominated (DuckDB-WASM+Cesium+OJS ~40s) — mirror helps the DATA phase only; the real lever is load-once + in-page iteration. Mirror range verified (curl -r => 206) but a full end-to-end speedup run hung in init (shakedown tomorrow; check 0-byte current/wide.parquet). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…be coherence) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Working handoff docs for the search-as-global-filter (A1, isamplesorg#234 Step 4) work — branch state, the globe logjam + Codex's reconciler spec, the fast verify-loop, the performance model, and Eric isamplesorg#248 / isamplesorg#249. Strip before the A1 PR. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ckDB-WASM
The ?data_base=/data dev override produced root-relative parquet URLs
(/data/foo.parquet). DuckDB-WASM's httpfs reads those as a virtual-FS glob
("No files found that match the pattern") instead of fetching over HTTP, so
the local-mirror verify loop hung in init with zero /data fetches — the
"shakedown" symptom. Resolve a root-relative data_base against
location.origin so the ergonomic ?data_base=/data form works; the prod
default and absolute (http://...) overrides pass through unchanged.
Verify-loop infra:
- dev_server.py: pin HTTP/1.1 (DuckDB's range reader expects keep-alive;
curl-verified 206 + multi-request keep-alive). Local full-GET-vs-206 is
DuckDB-WASM heuristic and moot over localhost; validate ranges on deploy.
- tests/playwright/shakedown-206.mjs: headless boot+search probe (no popup).
Confirms cold boot ~2.3s to live, bucchero search builds 2,693 pids ~9s.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Diagnoses whether a committed search renders sample points and whether the result depends on camera altitude. Boots at a given alt/lat/lng, fires the bucchero search, waits for the async point load to settle, and dumps the __a1log event sequence + final __a1globe() state. Finding: with a proper wait, the globe is A1-coherent at BOTH whole-globe (9000 km → renders all 2693 pids; computeViewRectangle saturates, not null) and zoomed-in (80 km → 2670 in-view) altitudes. The earlier "0 sample points" was a measure-too-early artifact, not a bug. Suggests the C3 fixes (4e79830) work in a foreground/headless context and the summary's "globe won't enter point mode" was likely a backgrounded-tab rAF-freeze artifact. Pending headed a1-verify.mjs verdict to rule out an animation-only race. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Default stays headed (real flyTo — what A1 is verified against). A headed window that opens UNFOCUSED becomes a background tab → Chrome freezes its rAF render loop → the page hangs mid-init (the same backgrounded-tab freeze that corrupted the original logjam observations). HEADLESS=1 sidesteps that: headless pages are always "active". Use it for CI / repeated runs; keep headed for a real-animation spot check. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… dev probe Pre-deploy cleanup (the summary's "don't ship" list): - Remove the a1PersistenceProbe OJS cell — a one-time dev check that console-logged on every load and threw a Catalog Error (the design point it verified, non-temp tables persisting across DuckDBClient connections, is proven and load-bearing in production now). - Gate the whole A1 observability block (a1dbg / __a1log / __a1state / __a1globe + on-page panel) behind ?debug=a1. Production users now get a clean global namespace and zero overhead; the Playwright harness opts in via ?debug=a1. All a1dbg?.() call sites already use optional chaining, so they are no-ops when the block doesn't run. Verified: ?debug=a1 → a1-verify.mjs still ✅ COHERENT (2693 pts); no flag → __a1globe/a1dbg/__a1log undefined, no panel, no probe console output. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rch_pids
doSearch scanned the 63 MB facets parquet TWICE per committed search: once in
buildSearchFilter (pid-set) and again for the side-panel results SELECT (+ a
third for the real-count COUNT when the 50-cap hit). On CI's smoke gate, the
broad "pottery" search blew the 90s budget (first A1 deploy failed there).
Fix: buildSearchFilter now materializes the side-panel columns (label, source,
place_name) and the relevance score IN THE SAME scan that builds the pid-set,
so the results SELECT and the COUNT read the small in-memory search_pids table
(aliased `s`) instead of re-scanning facets. One facets scan per search now,
matching pre-A1. sourceFilterSQL('s.source') + the bare-pid facetFilterSQL
compose unchanged; search_pids stays pid-keyed (dropped the weaker 5-col
DISTINCT — pid is unique, so the build is naturally one row per pid).
Verified locally (fast mirror): pottery 15.8s → 12.7s (build 6.9s + surface
updates); a1-verify still ✅ COHERENT; production-clean without ?debug=a1.
Note: the remaining time-to-results is buildSearchFilter + applySearchFilter
(globe/facet updates); if CI's smoke still exceeds budget, render the side
panel before applySearchFilterChange next.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…atch") updateCrossFilteredCounts computed facet-legend counts over the EXACT viewport (pad 0), while the samples-table COUNT, the point-mode loader, the "samples in view" stat, and the heatmap all pad by VIEWPORT_PAD_FACTOR (0.3). Matching samples in the 30% margin were counted by the table but not the legend, so the legend read low: off-by-one at a Cyprus deep-zoom (13 vs 14), and ~166 vs ~481 for material=rock at a wide Red-Sea view (RY, live rdhyee deploy). Aligns the last "in view" surface to the padded contract (isamplesorg#234 coherence). Applies the parked facet_count_padding.patch (one line + the coherence regression test) on the A1 branch, since the mismatch is live on the A1 deploy and isamplesorg#234 is exactly "make filter semantics coherent across surfaces." Verified at the reported view: facet Rock 167 → 496, now == table 496; a1-verify still ✅ COHERENT. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two issues from Codex's PR isamplesorg#251 review: 1. search_pids staging race — buildSearchFilter used a fixed `search_pids_next` name. Two overlapping searches could interleave so a later search swapped an earlier search's rows into `search_pids` under its own term (the token checks guard the publish, not the shared staging object). Use a token-scoped staging table `search_pids_next_${token}`, dropped in finally. Also stop DROPping the live `search_pids` on clear (an in-flight reader would throw) — flip active=false and leave it unreferenced until the next search replaces it. Verified: bucchero→soil back-to-back now publishes soil/2969 (its own count), not bucchero's under soil's term. 2. heatmap search-blind — renderHeatmap omitted searchFilterSQL and heatmapFilterHash omitted the search token, so the "filtered density" overlay stayed unfiltered under a committed search. Append window.searchFilterSQL('pid') to the heatmap aggregation and add the search token to the hash so it recomputes/re-keys on search commit/clear (isamplesorg#234 cross-surface coherence). a1-verify still ✅ COHERENT. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…le rows) Round-2 of Codex's PR isamplesorg#251 review: the previous no-drop clear left the prior search's rows in search_pids, and doSearch's side-panel SELECT reads `FROM search_pids` directly (does NOT gate on __searchFilter.active) — so a build failure could render the previous term's rows under the new term. Chose Codex's empty-table alternative over the early-return built-guard: an early return before the side-panel try would skip the isamplesorg#167 telemetry `finally`, whereas CREATE OR REPLACE TABLE search_pids (...empty...) keeps both the in-flight semi-join readers and the direct side-panel reader safely seeing zero rows, and a build failure flows through the existing results.length===0 → return-in-try → finally path with telemetry intact. Only clearSearchFilter changes; no doSearch control-flow restructure. Verified: a1-verify ✅ COHERENT; bucchero→clear→soil publishes soil/2969 (own count, no stale rows); clearSearchFilter is only called on empty-submit + build-failure, not per search. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Addresses Codex's remaining nit (PR isamplesorg#251, non-blocking): since clearSearchFilter() now leaves search_pids EMPTY, a genuine build failure and a true empty result set both reach the side-panel's results.length===0 branch. A `searchFilterBuildFailed` flag (set in the build catch) makes the panel say "Search error: couldn't build the filter…" on a real failure while still flowing through the isamplesorg#167 telemetry finally — instead of the misleading "No results for {term}". a1-verify still ✅ COHERENT. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
Author
Multi-AI review cycle → dual approvalRan an iterative Codex (gpt-5.4) review/revise loop on this PR alongside Claude's own review. Codex found genuine issues across three rounds; all fixed and empirically re-verified (the
Verified:
Verdicts: Claude — approve; Codex — APPROVE (no remaining requested changes). Deployed and verified on the rdhyee fork's GitHub Pages (same data/infra as isamples.org). |
This was referenced Jun 1, 2026
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A1: search as a real global filter (#234 Step 4)
Makes a committed free-text search a global filter across every explorer surface, instead of an optional side-panel lookup. On search,
buildSearchFilter()materializes a non-temp DuckDB tablesearch_pids(one ILIKE scan oversample_facets_v2), and every surface then constrains via a cheap semi-joinAND pid IN (SELECT pid FROM search_pids):search ∩ viewport.Fixes the incoherence reported in #247 (table claimed unrelated viewport samples "match the current filters" during a search; interim honesty fix shipped as #250, this completes it).
What's verified
bucchero→ 2,693 pids, point mode, h3 cleared,samplePointsLen ≤ total. Confirmed headed + headless and on a live deploy (real HTTP/2 + 206 ranges, prod data).search_pids, so it's one facets scan per search, matching pre-A1. CI smoke gate (pottery) passes.VIEWPORT_PAD_FACTOR) as the table/heatmap/point-loader/stat, so legend == "N match" (was reading low: ~166 vs ~481 at a wide view).a1dbg/__a1log/__a1globe+ on-page panel) is gated behind?debug=a1; default load has a clean global namespace and no overhead.Included commits
data_base dev-override fix · debug gating + dev-probe removal · double-scan collapse · facet-padding coherence fix · dev verify infra (
dev_server.py,tests/playwright/a1-verify.mjs+ probes). The dev/test infra is optional — happy to dropdev_server.py/ the probes if you'd prefer them out of upstream.Known / deferred (not blockers, flagged for review)
renderHeatmapomitssearchFilterSQL), so the "filtered density" layer stays unfiltered under a committed search. Follow-up.Relates to
#234 (umbrella), #247/#250, #248 (concept-URI search — a second producer of the same
search_pidsset), #249 (the "refactor explorer.qmd first?" question — this PR is a data point for it).Staging
Deployed and verified on the rdhyee fork's GitHub Pages (same data/infra as isamples.org). Suggest squash-merge — the branch carries some WIP commits whose messages predate the fixes.
🤖 Generated with Claude Code