Skip to content

Support RunEndEncoded arrays in comparison kernels (eq, lt, etc.) #9620

@asubiotto

Description

@asubiotto

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

The comparison kernels in arrow-ord (eq, neq, lt, lt_eq, gt, gt_eq) do not support RunEndEncoded arrays. Passing a REE array to any of these functions hits an unreachable!() panic, forcing callers to manually decode REE to flat arrays before comparing. This defeats the compression benefit of REE — particularly for filter operations (scalar comparison) which are the dominant use case.

This is part of the ongoing effort to support REE in compute kernels (#3520).

Describe the solution you'd like

Add REE support to compare_op in arrow-ord/src/cmp.rs, following the existing dictionary unwrapping pattern:

  1. REE vs scalar (the filter case): compare only the physical values (one per run), then bulk-expand the boolean result to logical length. This should be O(m) comparisons + O(n) expansion.

  2. REE vs REE: walk both run_ends arrays simultaneously, compare once per aligned segment, and bulk-fill the result. O(m1+m2) comparisons instead of O(n) per-element.

  3. Mixed cases (REE vs plain, REE wrapping dictionary): fall back to materializing a logical-length index vector and using the existing vectored comparison path.

Describe alternatives you've considered

  • Decode REE to flat before comparing: simpler (zero changes to cmp.rs) but materializes the full logical array, losing all compression benefit.

  • Treat REE as dictionary (implement AnyDictionaryArray for RunArray): would reuse the existing dict path but would just hide the same logical-length allocation inside normalized_keys(). The access patterns differ fundamentally — dictionary keys are stored per-element, while REE run-ends are compressed.

Additional context

None.

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrowChanges to the arrow crate

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions