Skip to content

Add collection facet to explorer (e.g. OpenContext PKAP) (#243)#244

Draft
rdhyee wants to merge 1 commit into
isamplesorg:mainfrom
rdhyee:feat/collection-facet
Draft

Add collection facet to explorer (e.g. OpenContext PKAP) (#243)#244
rdhyee wants to merge 1 commit into
isamplesorg:mainfrom
rdhyee:feat/collection-facet

Conversation

@rdhyee
Copy link
Copy Markdown
Contributor

@rdhyee rdhyee commented May 29, 2026

Resolves #243.

Adds a first-class collection dimension to the explorer: filter to a named
SamplingSite label (e.g. the OpenContext project "PKAP Survey Area") and
layer the existing material / context / object_type facets on top.

Why this design (additive)

"Collection" identity lives on SamplingSite entities, reached only by the
MaterialSampleRecord → produced_by → SamplingEvent → sampling_site → SamplingSite traversal — never on the sample rows the explorer renders. Doing
that array-join live in DuckDB-WASM is the documented in-browser bottleneck, so
membership is precomputed. The current sample_facets_v2 / facet_summaries / facet_cross_filter build pipeline isn't in any repo, so rather than risk
regenerating those, this feature is strictly additive — two new files that
touch nothing existing:

  • collections.parquet — dimension (collection_id, label, source, n_samples, centroid_lat/lng, bbox). 61,695 rows, ~3 MB. Powers the top-N
    checkboxes, the search box, and the Featured-Collections preset cameras.
  • sample_collections.parquet — membership (pid → collection_id). ~13 MB.
    The filter appends a second pid IN (SELECT … ) subquery in
    facetFilterSQL(), exactly parallel to the existing facet predicate.

A "collection" = a SamplingSite label (≈1,336 site rows share "PKAP Survey
Area"), keyed by a stable hash of (source, label). Verified: PKAP = 15,446
samples.

What's in the PR

  • scripts/build_collections.py — builds both files from /current/wide.parquet.
  • explorer.qmd — dual-UX collection facet (top-N checkboxes + search-the-tail
    for the ~60K long tail), ?collection= URL param wired through the existing
    facet lifecycle (applyQueryToFacetFilters / writeQueryState /
    handleFacetFilterChange) and the facetFilterSQL() chokepoint.
  • collections.qmd — Featured Collections page upgraded to identity-based
    &collection=<id> links + camera fly.
  • EXPLORER_STATE.md, data.qmd — document the new param and files.
  • tests/test_collections.py — Collections page + explorer facet-DOM checks.

⚠️ Merge gate — requires R2 upload first

The facet is inert until the two files are live on data.isamples.org:

  • Run python scripts/build_collections.py --out-dir <dir> --snapshot 202604
  • Upload isamples_202604_collections.parquet + isamples_202604_sample_collections.parquet to R2 (behind the data.isamples.org Worker)
  • Verify live: open explorer.html?collection=dd74c71982da0e21 → PKAP samples; layer a material facet to confirm it narrows
  • Run tests/test_collections.py against the deployed site

Known limitations (v1)

  • Collection facet counts are the collection's static total — not
    cross-filtered against other facets (no cross_filter cache for collections).
    The dots and table do respect the filter. Documented in EXPLORER_STATE.md.
  • Like the other facets, collection filtering applies at neighborhood/point zoom,
    not to zoomed-out H3 clusters (same #facetNote caveat).

🤖 Generated with Claude Code

@rdhyee rdhyee force-pushed the feat/collection-facet branch from 385ff3c to 99d10ed Compare May 29, 2026 00:39
…samplesorg#243)

Additive 'collection' dimension: filter the explorer to a named SamplingSite
label (e.g. OpenContext 'PKAP Survey Area'). Precomputes site membership via
the wide-parquet Sample->Event->Site traversal into two new R2 files; touches
none of the existing facet files. Rebased onto main so it sits cleanly on top
of the merged isamplesorg#242 heatmap work (disjoint regions, no conflict).

- scripts/build_collections.py: builds collections.parquet + sample_collections
  .parquet. Unnests BOTH relationship arrays (multi-event/multi-site safe),
  counts DISTINCT pids, orders membership by collection_id for row-group
  pruning. PKAP=15,446 verified; both files live on data.isamples.org.
- explorer.qmd: dual-UX collection facet (top-N checkboxes + search-the-tail),
  ?collection= URL param wired through the existing facet lifecycle and the
  facetFilterSQL() chokepoint (2nd subquery against sample_collections.parquet).
- collections.qmd: Featured Collections page uses identity-based &collection=.
- EXPLORER_STATE.md, data.qmd: document the new param and files.
- tests/test_collections.py: page + facet-DOM checks.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a 'collection' dimension to the explorer (e.g. OpenContext PKAP) — precompute site membership, then facet

1 participant