Add collection facet to explorer (e.g. OpenContext PKAP) (#243)#244
Draft
rdhyee wants to merge 1 commit into
Draft
Add collection facet to explorer (e.g. OpenContext PKAP) (#243)#244rdhyee wants to merge 1 commit into
rdhyee wants to merge 1 commit into
Conversation
385ff3c to
99d10ed
Compare
…samplesorg#243) Additive 'collection' dimension: filter the explorer to a named SamplingSite label (e.g. OpenContext 'PKAP Survey Area'). Precomputes site membership via the wide-parquet Sample->Event->Site traversal into two new R2 files; touches none of the existing facet files. Rebased onto main so it sits cleanly on top of the merged isamplesorg#242 heatmap work (disjoint regions, no conflict). - scripts/build_collections.py: builds collections.parquet + sample_collections .parquet. Unnests BOTH relationship arrays (multi-event/multi-site safe), counts DISTINCT pids, orders membership by collection_id for row-group pruning. PKAP=15,446 verified; both files live on data.isamples.org. - explorer.qmd: dual-UX collection facet (top-N checkboxes + search-the-tail), ?collection= URL param wired through the existing facet lifecycle and the facetFilterSQL() chokepoint (2nd subquery against sample_collections.parquet). - collections.qmd: Featured Collections page uses identity-based &collection=. - EXPLORER_STATE.md, data.qmd: document the new param and files. - tests/test_collections.py: page + facet-DOM checks. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
99d10ed to
c3a34e5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves #243.
Adds a first-class
collectiondimension to the explorer: filter to a namedSamplingSite label (e.g. the OpenContext project "PKAP Survey Area") and
layer the existing material / context / object_type facets on top.
Why this design (additive)
"Collection" identity lives on
SamplingSiteentities, reached only by theMaterialSampleRecord → produced_by → SamplingEvent → sampling_site → SamplingSitetraversal — never on the sample rows the explorer renders. Doingthat array-join live in DuckDB-WASM is the documented in-browser bottleneck, so
membership is precomputed. The current
sample_facets_v2 / facet_summaries / facet_cross_filterbuild pipeline isn't in any repo, so rather than riskregenerating those, this feature is strictly additive — two new files that
touch nothing existing:
collections.parquet— dimension (collection_id, label, source, n_samples, centroid_lat/lng, bbox). 61,695 rows, ~3 MB. Powers the top-Ncheckboxes, the search box, and the Featured-Collections preset cameras.
sample_collections.parquet— membership (pid → collection_id). ~13 MB.The filter appends a second
pid IN (SELECT … )subquery infacetFilterSQL(), exactly parallel to the existing facet predicate.A "collection" = a
SamplingSitelabel (≈1,336 site rows share "PKAP SurveyArea"), keyed by a stable hash of (source, label). Verified: PKAP = 15,446
samples.
What's in the PR
scripts/build_collections.py— builds both files from/current/wide.parquet.explorer.qmd— dual-UXcollectionfacet (top-N checkboxes + search-the-tailfor the ~60K long tail),
?collection=URL param wired through the existingfacet lifecycle (
applyQueryToFacetFilters/writeQueryState/handleFacetFilterChange) and thefacetFilterSQL()chokepoint.collections.qmd— Featured Collections page upgraded to identity-based&collection=<id>links + camera fly.EXPLORER_STATE.md,data.qmd— document the new param and files.tests/test_collections.py— Collections page + explorer facet-DOM checks.The facet is inert until the two files are live on
data.isamples.org:python scripts/build_collections.py --out-dir <dir> --snapshot 202604isamples_202604_collections.parquet+isamples_202604_sample_collections.parquetto R2 (behind the data.isamples.org Worker)explorer.html?collection=dd74c71982da0e21→ PKAP samples; layer a material facet to confirm it narrowstests/test_collections.pyagainst the deployed siteKnown limitations (v1)
cross-filtered against other facets (no cross_filter cache for collections).
The dots and table do respect the filter. Documented in
EXPLORER_STATE.md.not to zoomed-out H3 clusters (same
#facetNotecaveat).🤖 Generated with Claude Code